Functions documentation for read functions¶
Kojak¶
CroCo contains modules for reading data from the direct output of the Kojak search engine (http://www.kojak-ms.org) as well as for FDR-filtered results obtained by using Percolator. A detailed workflow how to obtain the FDR-filtered table can be found at http://www.kojak-ms.org/docs/validation.html.
Kojak only¶
- Load file(s): e.g.
FILENAME.kojak.txt
- OptionsWindow: Rawfile title (e.g.
FILENAME.raw
)
Kojak does not store the rawfile inside the output but it is required inside the xTable.
Functions to read and process data generated with the Kojak cross-link search engine.
-
croco.Kojak.
Read
(kojak_files, rawfile=None, decoy_string='decoy', col_order=None, compact=False)[source]¶ Read Kojak results file, calculate and process missing values required for xTable and return the xTable.
Parameters: - kojak_files (list) – path or paths to Kojak results file(s)
- rawfile (str) – name of the corresponding rawfile
- decoy_string (optional) – string used in kojak to label decoys
- col_order (list) –
- compact (bool) – Compact the xTable to only the columns given in col_order or not
Returns: xtable data table
Return type: pandas.DataFrame
Kojak and Percolator¶
For this script to work, the unpercolated Kojak file
(e.g. FILENAME.kojak.txt
) has to be in the same directory as the
percolated file.
- Load file(s): e.g.
FILENAME.validated.txt
- OptionsWindow: Rawfile title (e.g.
FILENAME.raw
)
Functions to read Percolator processed Kojak data.
-
croco.KojakPercolator.
Read
(perc_files, rawfile=None, validated_string='.validated', percolator_string='.perc', decoy_string='decoy', compact=False, col_order=None)[source]¶ Collects unprocessed and percolated results and returns an xtable data array.
Parameters: - perc_file (str) – path or list of paths to percolated Kojak file(s)
- validated_string (str) – user-defined string appended to the percolated filenames
- percolator_string (str) – user-defined string appended to the file prepared for percolating
- decoy_string (optional) – string used in kojak to label decoys
- rawfile (str) – name of the corresponding rawfile
- col_order (list) – List of xTable column titles that are used to sort and compress the resulting datatable
- compact (bool) – Whether to compact the xTable to only those columns listed in col_order
Returns: xtable data table
Return type: pandas.DataFrame
Kojak Helper Functions¶
Functions that are collectively used by croco.Kojak and croco.KojakPercolator.
-
croco.KojakFunctions.
assign_ID_and_type
(xtable)[source]¶ Calculate if a cross link is of inter or of loop type Refine the inter type into inter/intra/homomultimeric Generate ID for the xlinks
Parameters: xtable (pandas.DataFrame) – Table data structure with “prot”, “pos”, “pepseq” Returns: xTable with type and ID Return type: pandas.DataFrame
-
croco.KojakFunctions.
extract_peptide
(xtable)[source]¶ Extract peptide sequence, modification mass and position from the Peptide #1 and Peptide #2 entries
Parameters: xtable (pandas.DataFrame) – xTable data structure with “Peptide #1” and “Peptide #2” columns Returns: xTable with modmass, modpos, pepseq and mod Return type: pandas.DataFrame
-
croco.KojakFunctions.
extract_protein
(xtable)[source]¶ Extract protein name and relative cross-link position from the Protein # entries
Parameters: xtable (pandas.DataFrame) – xTable data structure with “Protein #1”, “Protein #2”, xpos1, xlink1, and xlink2 columns Returns: xTable with prot and xpos Return type: pandas.DataFrame
-
croco.KojakFunctions.
process_kojak_peptide
(peptide_string)[source]¶ Return Modifications, their localisation and the peptide sequence from a Kojak sequence string such as M[15.99]TDSKYFTTNK.
If modifications are found, two lists with modification masses, positions and the raw peptide sequence are returned. If no modififications are found within a peptide string, the function returns np.nan, np.nan and the sequence.
Parameters: peptide_string (str) – a Kojak peptide string Returns: list of modification masses list of int or np.nan: list of modification positions within the peptide str: peptide sequence without modifications Return type: list of float or np.nan
-
croco.KojakFunctions.
set_decoy
(xtable, decoy_string)[source]¶ sets the column decoy based on whether the decoy string is present in the protein name or not
Parameters: - xtable (pandas.DataFrame) – xTable with “prot” columns titles
- decoy_string (str) – Kojak decoy string
Returns: xTable with decoy column
Return type: pandas.DataFrame
pLink¶
CroCo can read data from the search engine pLink2 (http://pfind.ict.ac.cn/software/pLink1/index.html) as well as from its predecessor pLink1. Even though the latter is deprecated since 2018, we provide the parser for compatibility to old analyses.
pLink1¶
Functions to read pLink1 data.
-
croco.pLink1.
Read
(plinkdirs, col_order=None, compact=False)[source]¶ Read pLink report dir and return an xtabel data array.
Parameters: - plinkdirs (list) – plink report subdir (e.g. sample1)
- col_order (list) –
- compact (bool) – Compact the xTable to only the columns given in col_order or not
Returns: xTable data table
Return type: pandas.DataFrame
pLink2¶
Functions to read pLink2 files
-
croco.pLink2.
Read
(plinkdirs, col_order=None, compact=False)[source]¶ Read pLink2 report dir and return an xtabel data array.
Parameters: - plinkdirs – plink2 reports subdir (reports)
- col_order (list) – List of xTable column titles that are used to sort and compress the resulting datatable
- compact (bool) – Whether to compact the xTable to only those columns listed in col_order
Returns: data table
Return type: pandas.DataFrame
StavroX¶
Currently CroCo can only handle data from the cross-link search engine StavroX (https://www.stavrox.com/) that cannot target MS-cleavable cross-linkers. Support for MeroX (the analogue of StavroX that targets cleavable cross-linker) may be supported in future versions of CroCo.
Functions to read StavroX processed crosslink data.
-
croco.StavroX.
Read
(stavrox_files, ssf_file, col_order=None, compact=False)[source]¶ Collect data from StavroX spectrum search and return an xtable data array.
Parameters: - stavrox_files – path or list of paths to StavroX output file(s)
- ssf_file – properties.ssf to load modification IDs and masses
- col_order (list) – List of xTable column titles that are used to sort and compress the resulting datatable
- compact (bool) – Whether to compact the xTable to only those columns listed in col_order
Returns: xtable data table
Return type: pandas.DataFrame
Xi¶
Results from the Xi cross-link search engine (http://rappsilberlab.org/rappsilber-laboratory-home-page/tools/) can also be parsed by CroCo.
Xi only¶
- Load file(s): Path to Xi results file
(e.g.
FILENAME_XiVersion1.6.739.csv
)
Functions to read Xi processed crosslink data.
-
croco.Xi.
Read
(xi_files, col_order=None, compact=False)[source]¶ Collects data from Xi spectrum search and returns an xtable data array.
Parameters: - xi_file – path or list of paths to xi file(s)
- col_order (list) – List of xTable column titles that are used to sort and compress the resulting datatable
- compact (bool) – Whether to compact the xTable to only those columns listed in col_order
Returns: xtable data table
Return type: pandas.DataFrame
Xi & XiFDR¶
- Load file(s): Path to xiFDR file
(e.g.
FILENAME_5_FDR_PSM_xiFDR1.0.22.csv
)
Functions to read Xi processed crosslink data filtered with xiFDR.
This script is part of the CroCo cross-link converter project
-
croco.XiSearchFDR.
Read
(xifdr_files, xi_config, col_order=None, compact=False)[source]¶ Collects data from Xi spectrum search filtered by xiFDR and returns an xtable data array.
Parameters: - xifdr_files – path or list of paths to xiFDR file(s)
- xi_config – path to corresponding xi_config file
- col_order (list) – List of xTable column titles that are used to sort and compress the resulting datatable
- compact (bool) – Whether to keep the columns of the original dataframe or not
Returns: xtable data table
Return type: xtable
xQuest¶
xQuest (http://prottools.ethz.ch/orinner/public/htdocs/xquest/) results are also supported by CroCo.
- Load file(s): xQuest results file exported as csv
(e.g.
FILENAME_xquest.csv
)
Functions to read xQuest data.
-
croco.xQuest.
Read
(xQuest_files, col_order=None, compact=False)[source]¶ Read xQuest results file and return file in xTable format.
Parameters: - xQuest_files (list) – path to xQuest results file(s)
- col_order (list) – List of xTable column titles that are used to sort and compress the resulting datatable
- compact (bool) – Whether to compact the xTable to only those columns listed in col_order
Returns: xTable data table
Return type: pandas.DataFrame