3.2. Spectrum and Chromatogram
The spectrum class offers a python object for mass spectrometry data.
The spectrum object holds the basic information of the spectrum and offers
methods to interrogate properties of the spectrum.
Data, i.e. mass over charge (m/z) and intensity decoding is performed on demand
and can be accessed via their properties, e.g. peaks
.
The Spectrum class is used in the Reader
class.
There each spectrum is accessible as a spectrum object.
Theoretical spectra can also be created using the setter functions.
For example, m/z values, intensities, and peaks can be set by the
corresponding properties: pymzml.spec.Spectrum.mz
,
pymzml.spec.Spectrum.i
, pymzml.spec.Spectrum.peaks
.
Similar to the spectrum class, the chromatogram class allows interrogation with profile data (time, intensity) in an total ion chromatogram.
3.2.1. Spectrum
- class pymzml.spec.Spectrum(element=<Element ''>, measured_precision=5e-06)[source]
Spectrum class which inherits from class
pymzml.spec.MS_Spectrum
- Parameters
element (xml.etree.ElementTree.Element) – spectrum as xml element
- Keyword Arguments
measured_precision (float) – in ppm, i.e. 5e-6 equals to 5 ppm.
- __getitem__(accession)[source]
Access spectrum XML information by tag name
- Parameters
accession (str) – name of the XML tag
- Returns
value of the XML tag
- Return type
value (float or str)
- __add__(other_spec)[source]
Adds two pymzml spectra
- Parameters
other_spec (Spectrum) – spectrum to add to the current spectrum
- Returns
reference to the edited spectrum
- Return type
self (Spectrum)
Example:
>>> import pymzml >>> s = pymzml.spec.Spectrum( measuredPrescision = 20e-6 ) >>> file_to_read = "../mzML_example_files/xy.mzML.gz" >>> run = pymzml.run.Reader( ... file_to_read , ... MS1_Precision = 5e-6 , ... MSn_Precision = 20e-6 ... ) >>> for spec in run: ... s += spec
- __sub__(other_spec)[source]
Subtracts two pymzml spectra.
- Parameters
other_spec (spec.Spectrum) – spectrum to subtract from the current spectrum
- Returns
returns self after other_spec was subtracted
- Return type
self (spec.Spectrum)
- __mul__(value)[source]
Multiplies each intensity with a float, i.e. scales the spectrum.
- Parameters
value (int, float) – value to multiply the intensities with
- Returns
- returns self after intensities were scaled
by value
- Return type
self (spec.Spectrum)
- __truediv__(value)[source]
Divides each intensity by a float, i.e. scales the spectrum.
- Parameters
value (int, float) – value to divide the intensities by
- Returns
- returns self after intensities were scaled
by value.
- Return type
self (spec.Spectrum)
- property ID
Access the native id (last number in the id attribute) of the spectrum.
- Returns
native ID of the spectrum
- Return type
ID (str)
- property TIC
Property to access the total ion current for this spectrum.
- Returns
Total Ion Current of the spectrum.
- Return type
TIC (float)
- estimated_noise_level(mode='median')[source]
Calculates noise threshold for function remove_noise.
Different modes are available. Default is ‘median’
- Keyword Arguments
mode (str) – define mode for removing noise. Default = “median” (other modes: “mean”, “mad”)
- Returns
estimate noise threshold
- Return type
noise_level (float)
- extreme_values(key)[source]
Find extreme values, minimal and maximum m/z and intensity
- Parameters
key (str) – m/z : “mz” or intensity : “i”
- Returns
tuple of minimal and maximum m/z or intensity
- Return type
extrema (tuple)
- get(acc, default=None)[source]
Mimic dicts get function.
- Parameters
acc (str) – accession or obo tag to return
default (None, optional) – default value if acc is not found
- has_overlapping_peak(mz)[source]
Checks if a spectrum has more than one peak for a given m/z value and within the measured precision
- Parameters
mz (float) – m/z value which should be checked
- Returns
Returns
True
if a nearby peak is detected, otherwiseFalse
- Return type
Boolean (bool)
- has_peak(mz2find)[source]
Checks if a Spectrum has a certain peak. Requires a m/z value as input and returns a list of peaks if the m/z value is found in the spectrum, otherwise
[]
is returned. Every peak is a tuple of m/z and intensity.Note
Multiple peaks may be found, depending on the defined precisions
- Parameters
mz2find (float) – m/z value which should be found
- Returns
list of m/z, i tuples
- Return type
peaks (list)
Example:
>>> import pymzml >>> example_file = 'tests/data/example.mzML' >>> run = pymzml.run.Reader( ... example_file, ... MS_precisions = { ... 1 : 5e-6, ... 2 : 20e-6 ... } ... ) >>> for spectrum in run: ... if spectrum.ms_level == 2: ... peak_to_find = spectrum.has_peak(1016.5404) ... print(peak_to_find) [(1016.5404, 19141.735187697403)]
- highest_peaks(n)[source]
Function to retrieve the n-highest centroided peaks of the spectrum.
- Parameters
n (int) – number of highest peaks to return.
- Returns
list mz, i tupls with n-highest
- Return type
centroided peaks (list)
Example:
>>> run = pymzml.run.Reader( ... "tests/data/example.mzML.gz", ... MS_precisions = { ... 1 : 5e-6, ... 2 : 20e-6 ... } ... ) >>> for spectrum in run: ... if spectrum.ms_level == 2: ... if spectrum.ID == 1770: ... for mz,i in spectrum.highest_peaks(5): ... print(mz, i)
- property i
Returns the list of the intensity values. If the intensity values are encoded, the function
_decode()
is used to decode the encoded data.The i property can also be set, e.g. for theoretical data. However, it is recommended to use the peaks property to set mz and intensity tuples at same time.
- Returns
i (list): list of intensity values from the analyzed spectrum
- property id_dict
Access to all entries stored the id attribute of a spectrum.
- Returns
key value pairs for all entries in id attribute of a spectrum
- Return type
id_dict (dict)
- property index
Access the index of the spectrum.
- Returns
index of the spectrum
- Return type
index (int)
Note
This does not necessarily correspond to the native spectrum ID
- property measured_precision
Sets the measured and internal precision
- Returns
measured precision (e.g. 5e-6)
- Return type
value (float)
- property ms_level
Property to access the ms level.
- Return type
ms_level (int)
- property mz
Returns the list of m/z values. If the m/z values are encoded, the function
_decode()
is used to decode the encoded data. The mz property can also be set, e.g. for theoretical data. However, it is recommended to use the peaks property to set mz and intensity tuples at same time.- Returns
list of m/z values of spectrum.
- Return type
mz (list)
- peaks(peak_type)[source]
Decode and return a list of mz/i tuples.
- Parameters
peak_type (str) – currently supported types are: raw, centroided and reprofiled
- Returns
list or numpy array of mz/i tuples or arrays
- Return type
peaks (list or ndarray)
- ppm2abs(value, ppm_value, direction=1, factor=1)[source]
Returns the value plus (or minus, dependent on direction) the error (measured precision ) for this value.
- Parameters
value (float) – m/z value
ppm_value (int) – ppm value
- Keyword Arguments
direction (int) – plus or minus the considered m/z value. The argument direction should be 1 or -1
factor (int) – multiplication factor for the imprecision. The argument factor should be bigger than 0
- Returns
imprecision for the given value
- Return type
imprecision (float)
- property precursors
List the precursor information of this spectrum, if available. :returns: list of precursor ids for this spectrum. :rtype: precursor(list)
- reduce(peak_type='raw', mz_range=(None, None))[source]
Remove all m/z values outside the given range.
- Parameters
mz_range (tuple) – tuple of min, max values
- Returns
list of mz, i tuples in the given range.
- Return type
peaks (list)
- remove_noise(mode='median', noise_level=None, signal_to_noise_threshold=1.0)[source]
Function to remove noise from peaks, centroided peaks and reprofiled peaks.
- Keyword Arguments
mode (str) – define mode for removing noise. Default = “median”
modes ((other) –
“mean”, “mad”)
noise_level (float): noise threshold signal_to_noise_threshold (float): S/N threshold for a peak to be accepted
- Returns
Returns a list with tuples of m/z-intensity pairs above the noise threshold
- Return type
reprofiled peaks (list)
- property scan_time
Property to access the retention time and retention time unit. Please note, that we do not assume the retention time unit, if it is not correctly defined in the mzML. It is set to ‘unicorns’ in this case.
- Returns
scan_time_unit (str):
- Return type
scan_time (float)
- scan_time_in_minutes()[source]
Property to access the retention time in minutes. If the retention time unit is defined within the mzML, the retention time is converted into minutes and returned without the unit.
- Return type
scan_time (float)
- property selected_precursors
Property to access the selected precursors of a MS2 spectrum. Returns a list of dicts containing the precursors mz and, if available intensity and charge for each precursor.
- Return type
selected_precursors (list)
- set_peaks(peaks, peak_type)[source]
Assign a custom peak array of type peak_type
- Parameters
peaks (list or ndarray) – list or array of mz/i values
peak_type (str) – Either raw, centroided or reprofiled
- similarity_to(spec2, round_precision=0)[source]
Compares two spectra and returns cosine
- Parameters
spec2 (Spectrum) – another pymzml spectrum that is compared to the current spectrum.
- Keyword Arguments
round_precision (int) – precision mzs are rounded to, i.e. round( mz, round_precision )
- Returns
- value between 0 and 1, i.e. the cosine between the
two spectra.
- Return type
cosine (float)
Note
Spectra data is transformed into an n-dimensional vector, where m/z values are binned in bins of 10 m/z and the intensities are added up. Then the cosine is calculated between those two vectors. The more similar the specs are, the closer the value is to 1.
- property t_mz_set
Creates a set of integers out of transformed m/z values (including all values in the defined imprecision). This is used to accelerate has_peak function and similar.
- Returns
set of transformed m/z values
- Return type
t_mz_set (set)
- transform_mz(value)[source]
pymzml uses an internal precision for different tasks. This precision depends on the measured precision and is calculated when
spec.Spectrum.measured_precision
is invoked. transform_mz can be used to transform m/z values into the internal standard.- Parameters
value (float) – m/z value
- Returns
to internal standard transformed mz value this value can be used to probe internal dictionaries, lists or sets, e.g.
pymzml.spec.Spectrum.t_mz_set()
- Return type
transformed value (float)
Example
>>> import pymzml >>> run = pymzml.run.Reader( ... "test.mzML.gz" , ... MS_precisions = { ... 1 : 5e-6, ... 2 : 20e-6 ... } ... ) >>> >>> for spectrum in run: ... if spectrum.ms_level == 2: ... peak_to_find = spectrum.has_deconvoluted_peak( ... 1044.5804 ... ) ... print(peak_to_find) [(1044.5596, 3809.4356300564586)]
- property transformed_mz_with_error
Returns transformed m/z value with error
- Returns
Transformed m/z values in dictionary
{
m/z_with_error : [(m/z,intensity), …], …
}
- Return type
tmz values (dict)
- property transformed_peaks
m/z value is multiplied by the internal precision.
- Returns
Returns a list of peaks (tuples of mz and intensity). Float m/z values are adjusted by the internal precision to integers.
- Return type
Transformed peaks (list)
3.2.2. Chromatogram
- class pymzml.spec.Chromatogram(element, measured_precision=5e-06, param=None)[source]
Class for Chromatogram access and handling.
- peaks()[source]
Return the list of peaks of the spectrum as tuples (time, intensity).
- Returns
list of time, intensity tuples
- Return type
peaks (list)
Example:
>>> import pymzml >>> run = pymzml.run.Reader( ... spectra.mzMl.gz, ... MS_precisions = { ... 1 : 5e-6, ... 2 : 20e-6 ... } ... ) >>> for entry in run: ... if isinstance(entry, pymzml.spec.Chromatogram): ... for time, intensity in entry.peaks: ... print(time, intensity)
Note
The peaks property can also be set, e.g. for theoretical data. It requires a list of time/intensity tuples.
- property profile
Returns the list of peaks of the chromatogram as tuples (time, intensity).
- Returns
list of time, i tuples
- Return type
peaks (list)
Example:
>>> import pymzml >>> run = pymzml.run.Reader( ... spectra.mzMl.gz, ... MS_precisions = { ... 1 : 5e-6, ... 2 : 20e-6 ... } ... ) >>> for entry in run: ... if isinstance(entry, pymzml.spec.Chromatogram): ... for time, intensity in entry.peaks: ... print(time, intensity)
Note
The peaks property can also be set, e.g. for theoretical data. It requires a list of time/intensity tuples.
- property time
Returns the list of time values. If the time values are encoded, the function _decode() is used to decode the encoded data.
The time property can also be set, e.g. for theoretical data. However, it is recommended to use the profile property to set time and intensity tuples at same time.
- Returns
list of time values from the analyzed chromatogram
- Return type
time (list)
3.2.3. MS_Spectrum
- class pymzml.spec.MS_Spectrum[source]
General spectrum class for data handling.
- get_element_by_name(name)[source]
Get element from the original tree by it’s unit name.
- Parameters
name (str) – unit name of the mzml element.
- Keyword Arguments
obo_version (str, optional) – obo version number.
- get_element_by_path(hooks)[source]
Find elements in spectrum by its path.
- Parameters
hooks (list) – list of parent elements for the target element.
- Returns
list of XML objects found in the path
- Return type
elements (list)
Example
To access cvParam in scanWindow tag:
>>> spec.get_element_by_path(['scanList', 'scan', 'scanWindowList', ... 'scanWindow', 'cvParam'])
- property measured_precision
Set the measured and internal precision.
- Returns
measured Precision (e.g. 5e-6)
- Return type
value (float)
- to_string(encoding='latin-1', method='xml')[source]
Return string representation of the xml element the spectrum was initialized with.
- Keyword Arguments
encoding (str) –
text encoding of the returned string.
Default is latin-1.
method (str) –
text format of the returned string.
Default is xml, alternatives are html and text.
- Returns
xml string representation of the spectrum.
- Return type
element (str)