Functions documentation for read functions

Kojak

CroCo contains modules for reading data from the direct output of the Kojak search engine (http://www.kojak-ms.org) as well as for FDR-filtered results obtained by using Percolator. A detailed workflow how to obtain the FDR-filtered table can be found at http://www.kojak-ms.org/docs/validation.html.

Kojak only

  • Load file(s): e.g. FILENAME.kojak.txt
  • OptionsWindow: Rawfile title (e.g. FILENAME.raw)

Kojak does not store the rawfile inside the output but it is required inside the xTable.

Functions to read and process data generated with the Kojak cross-link search engine.

croco.Kojak.Read(kojak_files, rawfile=None, decoy_string='decoy', col_order=None, compact=False)[source]

Read Kojak results file, calculate and process missing values required for xTable and return the xTable.

Parameters:
  • kojak_files (list) – path or paths to Kojak results file(s)
  • rawfile (str) – name of the corresponding rawfile
  • decoy_string (optional) – string used in kojak to label decoys
  • col_order (list) –
  • compact (bool) – Compact the xTable to only the columns given in col_order or not
Returns:

xtable data table

Return type:

pandas.DataFrame

Kojak and Percolator

For this script to work, the unpercolated Kojak file (e.g. FILENAME.kojak.txt) has to be in the same directory as the percolated file.

  • Load file(s): e.g. FILENAME.validated.txt
  • OptionsWindow: Rawfile title (e.g. FILENAME.raw)

Functions to read Percolator processed Kojak data.

croco.KojakPercolator.Read(perc_files, rawfile=None, validated_string='.validated', percolator_string='.perc', decoy_string='decoy', compact=False, col_order=None)[source]

Collects unprocessed and percolated results and returns an xtable data array.

Parameters:
  • perc_file (str) – path or list of paths to percolated Kojak file(s)
  • validated_string (str) – user-defined string appended to the percolated filenames
  • percolator_string (str) – user-defined string appended to the file prepared for percolating
  • decoy_string (optional) – string used in kojak to label decoys
  • rawfile (str) – name of the corresponding rawfile
  • col_order (list) – List of xTable column titles that are used to sort and compress the resulting datatable
  • compact (bool) – Whether to compact the xTable to only those columns listed in col_order
Returns:

xtable data table

Return type:

pandas.DataFrame

Kojak Helper Functions

Functions that are collectively used by croco.Kojak and croco.KojakPercolator.

croco.KojakFunctions.assign_ID_and_type(xtable)[source]

Calculate if a cross link is of inter or of loop type Refine the inter type into inter/intra/homomultimeric Generate ID for the xlinks

Parameters:xtable (pandas.DataFrame) – Table data structure with “prot”, “pos”, “pepseq”
Returns:xTable with type and ID
Return type:pandas.DataFrame
croco.KojakFunctions.extract_peptide(xtable)[source]

Extract peptide sequence, modification mass and position from the Peptide #1 and Peptide #2 entries

Parameters:xtable (pandas.DataFrame) – xTable data structure with “Peptide #1” and “Peptide #2” columns
Returns:xTable with modmass, modpos, pepseq and mod
Return type:pandas.DataFrame
croco.KojakFunctions.extract_protein(xtable)[source]

Extract protein name and relative cross-link position from the Protein # entries

Parameters:xtable (pandas.DataFrame) – xTable data structure with “Protein #1”, “Protein #2”, xpos1, xlink1, and xlink2 columns
Returns:xTable with prot and xpos
Return type:pandas.DataFrame
croco.KojakFunctions.process_kojak_peptide(peptide_string)[source]

Return Modifications, their localisation and the peptide sequence from a Kojak sequence string such as M[15.99]TDSKYFTTNK.

If modifications are found, two lists with modification masses, positions and the raw peptide sequence are returned. If no modififications are found within a peptide string, the function returns np.nan, np.nan and the sequence.

Parameters:peptide_string (str) – a Kojak peptide string
Returns:list of modification masses list of int or np.nan: list of modification positions within the peptide str: peptide sequence without modifications
Return type:list of float or np.nan
croco.KojakFunctions.set_decoy(xtable, decoy_string)[source]

sets the column decoy based on whether the decoy string is present in the protein name or not

Parameters:
  • xtable (pandas.DataFrame) – xTable with “prot” columns titles
  • decoy_string (str) – Kojak decoy string
Returns:

xTable with decoy column

Return type:

pandas.DataFrame

StavroX

Currently CroCo can only handle data from the cross-link search engine StavroX (https://www.stavrox.com/) that cannot target MS-cleavable cross-linkers. Support for MeroX (the analogue of StavroX that targets cleavable cross-linker) may be supported in future versions of CroCo.

Functions to read StavroX processed crosslink data.

croco.StavroX.Read(stavrox_files, ssf_file, col_order=None, compact=False)[source]

Collect data from StavroX spectrum search and return an xtable data array.

Parameters:
  • stavrox_files – path or list of paths to StavroX output file(s)
  • ssf_file – properties.ssf to load modification IDs and masses
  • col_order (list) – List of xTable column titles that are used to sort and compress the resulting datatable
  • compact (bool) – Whether to compact the xTable to only those columns listed in col_order
Returns:

xtable data table

Return type:

pandas.DataFrame

Xi

Results from the Xi cross-link search engine (http://rappsilberlab.org/rappsilber-laboratory-home-page/tools/) can also be parsed by CroCo.

Xi only

  • Load file(s): Path to Xi results file (e.g. FILENAME_XiVersion1.6.739.csv)

Functions to read Xi processed crosslink data.

croco.Xi.Read(xi_files, col_order=None, compact=False)[source]

Collects data from Xi spectrum search and returns an xtable data array.

Parameters:
  • xi_file – path or list of paths to xi file(s)
  • col_order (list) – List of xTable column titles that are used to sort and compress the resulting datatable
  • compact (bool) – Whether to compact the xTable to only those columns listed in col_order
Returns:

xtable data table

Return type:

pandas.DataFrame

Xi & XiFDR

  • Load file(s): Path to xiFDR file (e.g. FILENAME_5_FDR_PSM_xiFDR1.0.22.csv)

Functions to read Xi processed crosslink data filtered with xiFDR.

This script is part of the CroCo cross-link converter project

croco.XiSearchFDR.Read(xifdr_files, xi_config, col_order=None, compact=False)[source]

Collects data from Xi spectrum search filtered by xiFDR and returns an xtable data array.

Parameters:
  • xifdr_files – path or list of paths to xiFDR file(s)
  • xi_config – path to corresponding xi_config file
  • col_order (list) – List of xTable column titles that are used to sort and compress the resulting datatable
  • compact (bool) – Whether to keep the columns of the original dataframe or not
Returns:

xtable data table

Return type:

xtable

xQuest

xQuest (http://prottools.ethz.ch/orinner/public/htdocs/xquest/) results are also supported by CroCo.

  • Load file(s): xQuest results file exported as csv (e.g. FILENAME_xquest.csv)

Functions to read xQuest data.

croco.xQuest.Read(xQuest_files, col_order=None, compact=False)[source]

Read xQuest results file and return file in xTable format.

Parameters:
  • xQuest_files (list) – path to xQuest results file(s)
  • col_order (list) – List of xTable column titles that are used to sort and compress the resulting datatable
  • compact (bool) – Whether to compact the xTable to only those columns listed in col_order
Returns:

xTable data table

Return type:

pandas.DataFrame