Functions documentation for read functions¶
Kojak¶
CroCo contains modules for reading data from the direct output of the Kojak search engine (http://www.kojak-ms.org) as well as for FDR-filtered results obtained by using Percolator. A detailed workflow how to obtain the FDR-filtered table can be found at http://www.kojak-ms.org/docs/validation.html.
Kojak only¶
- Load file(s): e.g.
FILENAME.kojak.txt - OptionsWindow: Rawfile title (e.g.
FILENAME.raw)
Kojak does not store the rawfile inside the output but it is required inside the xTable.
Functions to read and process data generated with the Kojak cross-link search engine.
-
croco.Kojak.Read(kojak_files, rawfile=None, decoy_string='decoy', col_order=None, compact=False)[source]¶ Read Kojak results file, calculate and process missing values required for xTable and return the xTable.
Parameters: - kojak_files (list) – path or paths to Kojak results file(s)
- rawfile (str) – name of the corresponding rawfile
- decoy_string (optional) – string used in kojak to label decoys
- col_order (list) –
- compact (bool) – Compact the xTable to only the columns given in col_order or not
Returns: xtable data table
Return type: pandas.DataFrame
Kojak and Percolator¶
For this script to work, the unpercolated Kojak file
(e.g. FILENAME.kojak.txt) has to be in the same directory as the
percolated file.
- Load file(s): e.g.
FILENAME.validated.txt - OptionsWindow: Rawfile title (e.g.
FILENAME.raw)
Functions to read Percolator processed Kojak data.
-
croco.KojakPercolator.Read(perc_files, rawfile=None, validated_string='.validated', percolator_string='.perc', decoy_string='decoy', compact=False, col_order=None)[source]¶ Collects unprocessed and percolated results and returns an xtable data array.
Parameters: - perc_file (str) – path or list of paths to percolated Kojak file(s)
- validated_string (str) – user-defined string appended to the percolated filenames
- percolator_string (str) – user-defined string appended to the file prepared for percolating
- decoy_string (optional) – string used in kojak to label decoys
- rawfile (str) – name of the corresponding rawfile
- col_order (list) – List of xTable column titles that are used to sort and compress the resulting datatable
- compact (bool) – Whether to compact the xTable to only those columns listed in col_order
Returns: xtable data table
Return type: pandas.DataFrame
Kojak Helper Functions¶
Functions that are collectively used by croco.Kojak and croco.KojakPercolator.
-
croco.KojakFunctions.assign_ID_and_type(xtable)[source]¶ Calculate if a cross link is of inter or of loop type Refine the inter type into inter/intra/homomultimeric Generate ID for the xlinks
Parameters: xtable (pandas.DataFrame) – Table data structure with “prot”, “pos”, “pepseq” Returns: xTable with type and ID Return type: pandas.DataFrame
-
croco.KojakFunctions.extract_peptide(xtable)[source]¶ Extract peptide sequence, modification mass and position from the Peptide #1 and Peptide #2 entries
Parameters: xtable (pandas.DataFrame) – xTable data structure with “Peptide #1” and “Peptide #2” columns Returns: xTable with modmass, modpos, pepseq and mod Return type: pandas.DataFrame
-
croco.KojakFunctions.extract_protein(xtable)[source]¶ Extract protein name and relative cross-link position from the Protein # entries
Parameters: xtable (pandas.DataFrame) – xTable data structure with “Protein #1”, “Protein #2”, xpos1, xlink1, and xlink2 columns Returns: xTable with prot and xpos Return type: pandas.DataFrame
-
croco.KojakFunctions.process_kojak_peptide(peptide_string)[source]¶ Return Modifications, their localisation and the peptide sequence from a Kojak sequence string such as M[15.99]TDSKYFTTNK.
If modifications are found, two lists with modification masses, positions and the raw peptide sequence are returned. If no modififications are found within a peptide string, the function returns np.nan, np.nan and the sequence.
Parameters: peptide_string (str) – a Kojak peptide string Returns: list of modification masses list of int or np.nan: list of modification positions within the peptide str: peptide sequence without modifications Return type: list of float or np.nan
-
croco.KojakFunctions.set_decoy(xtable, decoy_string)[source]¶ sets the column decoy based on whether the decoy string is present in the protein name or not
Parameters: - xtable (pandas.DataFrame) – xTable with “prot” columns titles
- decoy_string (str) – Kojak decoy string
Returns: xTable with decoy column
Return type: pandas.DataFrame
pLink¶
CroCo can read data from the search engine pLink2 (http://pfind.ict.ac.cn/software/pLink1/index.html) as well as from its predecessor pLink1. Even though the latter is deprecated since 2018, we provide the parser for compatibility to old analyses.
pLink1¶
Functions to read pLink1 data.
-
croco.pLink1.Read(plinkdirs, col_order=None, compact=False)[source]¶ Read pLink report dir and return an xtabel data array.
Parameters: - plinkdirs (list) – plink report subdir (e.g. sample1)
- col_order (list) –
- compact (bool) – Compact the xTable to only the columns given in col_order or not
Returns: xTable data table
Return type: pandas.DataFrame
pLink2¶
Functions to read pLink2 files
-
croco.pLink2.Read(plinkdirs, col_order=None, compact=False)[source]¶ Read pLink2 report dir and return an xtabel data array.
Parameters: - plinkdirs – plink2 reports subdir (reports)
- col_order (list) – List of xTable column titles that are used to sort and compress the resulting datatable
- compact (bool) – Whether to compact the xTable to only those columns listed in col_order
Returns: data table
Return type: pandas.DataFrame
StavroX¶
Currently CroCo can only handle data from the cross-link search engine StavroX (https://www.stavrox.com/) that cannot target MS-cleavable cross-linkers. Support for MeroX (the analogue of StavroX that targets cleavable cross-linker) may be supported in future versions of CroCo.
Functions to read StavroX processed crosslink data.
-
croco.StavroX.Read(stavrox_files, ssf_file, col_order=None, compact=False)[source]¶ Collect data from StavroX spectrum search and return an xtable data array.
Parameters: - stavrox_files – path or list of paths to StavroX output file(s)
- ssf_file – properties.ssf to load modification IDs and masses
- col_order (list) – List of xTable column titles that are used to sort and compress the resulting datatable
- compact (bool) – Whether to compact the xTable to only those columns listed in col_order
Returns: xtable data table
Return type: pandas.DataFrame
Xi¶
Results from the Xi cross-link search engine (http://rappsilberlab.org/rappsilber-laboratory-home-page/tools/) can also be parsed by CroCo.
Xi only¶
- Load file(s): Path to Xi results file
(e.g.
FILENAME_XiVersion1.6.739.csv)
Functions to read Xi processed crosslink data.
-
croco.Xi.Read(xi_files, col_order=None, compact=False)[source]¶ Collects data from Xi spectrum search and returns an xtable data array.
Parameters: - xi_file – path or list of paths to xi file(s)
- col_order (list) – List of xTable column titles that are used to sort and compress the resulting datatable
- compact (bool) – Whether to compact the xTable to only those columns listed in col_order
Returns: xtable data table
Return type: pandas.DataFrame
Xi & XiFDR¶
- Load file(s): Path to xiFDR file
(e.g.
FILENAME_5_FDR_PSM_xiFDR1.0.22.csv)
Functions to read Xi processed crosslink data filtered with xiFDR.
This script is part of the CroCo cross-link converter project
-
croco.XiSearchFDR.Read(xifdr_files, xi_config, col_order=None, compact=False)[source]¶ Collects data from Xi spectrum search filtered by xiFDR and returns an xtable data array.
Parameters: - xifdr_files – path or list of paths to xiFDR file(s)
- xi_config – path to corresponding xi_config file
- col_order (list) – List of xTable column titles that are used to sort and compress the resulting datatable
- compact (bool) – Whether to keep the columns of the original dataframe or not
Returns: xtable data table
Return type: xtable
xQuest¶
xQuest (http://prottools.ethz.ch/orinner/public/htdocs/xquest/) results are also supported by CroCo.
- Load file(s): xQuest results file exported as csv
(e.g.
FILENAME_xquest.csv)
Functions to read xQuest data.
-
croco.xQuest.Read(xQuest_files, col_order=None, compact=False)[source]¶ Read xQuest results file and return file in xTable format.
Parameters: - xQuest_files (list) – path to xQuest results file(s)
- col_order (list) – List of xTable column titles that are used to sort and compress the resulting datatable
- compact (bool) – Whether to compact the xTable to only those columns listed in col_order
Returns: xTable data table
Return type: pandas.DataFrame