3.5. File Access
pymzML offers support for different kinds of mzML files. The following classes are wrappers for access of different types of mzML files, which allows the implementation of file type specific search and data retrieving algorithms. An explanation of how to implement your own file class can be found in the advanced usage section.
3.5.1. File Interface
- class pymzml.file_interface.FileInterface(path, encoding, build_index_from_scratch=False, index_regex=None)[source]
Interface to different mzML formats.
- __getitem__(identifier)[source]
Access the item with id ‘identifier’ in the file.
- Parameters
identifier (str) – native id of the item to access
- Returns
text associated with the given identifier
- Return type
data (str)
- __init__(path, encoding, build_index_from_scratch=False, index_regex=None)[source]
Initialize a object interface to mzML files.
- Parameters
path (str) – path to the mzML file
encoding (str) – encoding of the file
- _indexed_gzip(path)[source]
Check if the given file is an indexed gzip file or not.
- Parameters
path (str) – path to the file
- Returns
True if path is a gzip file with index, else False
- Return type
bool
- _open(path_or_file)[source]
Open a file like object resp. a wrapper for a file like object.
- Parameters
path (str) – path to the mzml file
- Returns
instance of
StandardGzip
,IndexedGzip
orStandardMzml
, based on the file ending of ‘path’- Return type
file_handler
3.5.2. File Classes
3.5.2.1. mzML
- class pymzml.file_classes.standardMzml.StandardMzml(path, encoding, build_index_from_scratch=False, index_regex=None)[source]
- __getitem__(identifier)[source]
Access the item with id ‘identifier’.
Either use linear, binary or interpolated search.
- Parameters
identifier (str) – native id of the item to access
- Returns
text associated with the given identifier
- Return type
data (str)
- __init__(path, encoding, build_index_from_scratch=False, index_regex=None)[source]
Initalize Wrapper object for standard mzML files.
- Parameters
path (str) – path to the file
encoding (str) – encoding of the file
- _binary_search(target_index)[source]
Retrieve spectrum for a given spectrum ID using binary jumps
- Parameters
target_index (int) – native id of the spectrum to access
- Returns
pymzML spectrum
- Return type
- _build_index(from_scratch=False)[source]
Build an index.
A list of offsets to which a file pointer can seek directly to access a particular spectrum or chromatogram without parsing the entire file.
- Parameters
from_scratch (bool) – Whether or not to force building the index from scratch, by parsing the file, if no existing index can be found.
- Returns
A file-like object used to access the indexed content by seeking to a particular offset for the file.
- _build_index_from_scratch(seeker)[source]
Build an index of spectra/chromatogram data with offsets by parsing the file.
- _interpol_search(target_index, chunk_size=8, fallback_cutoff=100)[source]
Use linear interpolation search to find spectra faster.
- Parameters
target_index (str or int) – native id of the item to access
- Keyword Arguments
chunk_size (int) – size of the chunk to read in one go in kb
- _read_extremes()[source]
Read min and max spectrum ids. Required for binary jumps.
- Returns
list of tuples containing spec_id and file_offset
- Return type
seek_list (list)
3.5.2.2. Gzip
- class pymzml.file_classes.standardGzip.StandardGzip(path, encoding)[source]
- __getitem__(identifier)[source]
Access the item with id ‘identifier’ in the file by iterating the xml-tree.
- Parameters
identifier (str) – native id of the item to access
- Returns
text associated with the given identifier
- Return type
data (str)
3.5.2.3. iGzip
- class pymzml.file_classes.indexedGzip.IndexedGzip(path, encoding)[source]
- __getitem__(identifier)[source]
Access the item with id ‘identifier’ in the file.
- Parameters
identifier (str) – native id of the item to access
- Returns
text associated with the given identifier
- Return type
data (str)