VENGEANCE (2015.12.15) |
This release improves the precision of handling variable modifications.
Several new commands and notational add-ons make specifying how to test modification significantly more nuanced.
The methods for handling variable modifications have been extensively re-written.
|
-
The value of the command "protein, ptm complexity" (C, a floating point number 0.0–12.0)
sets the maximum number of variable modification alternatives that will be tested for a particular
peptide. The number of alternatives is 2.0C. If this number is not specified, the default value C = 6.0 will be used.
-
The specification of a variable modification can include a value for the maximum number of
modification sites to be considered in a single peptide. For example, the modification specification
15.994915@M would normally be used to test for M oxidation. If you wish only to consider one such modification
per peptide, you can now write "15.994915@1M". Any number from 1–10 can be used in this notation. If not
specified, a default value of 10 is used.
- It is possible to specify that a variable modification NOT occur at the C-terminus of a peptide. For
example, previously "42.010565@K" would have been used to test for K acetylation. Using the new notation,
"42.010565@]K" can be used, which will not test C-terminal lysines for acetylation (which are chemically
impossible for tryptic peptides). This notation is useful for most lysine post-translational modifications, as well as dimethyl-arginine.
Note: monomethyl-arginine and -lysine are both susceptible to trypsin cleavage, so this notation is not
recommended for monomethyl variable modifications. It is also not recommended for use with carbamylation
— a urea artifact that can occur during tryptic digestion —
although reducing the number of carbamylations allowed per peptide, e.g., "43.005814@1K", can be quite useful.
-
The legacy command "spectrum, use noise suppression" has been removed from the project: the original
method was created for LCQ spectra and it no longer had any practical utility.
-
Limits have been introduced to the length of peptide that will be considered to be a solution to a mass spectrum.
Previous limits had only been based on the parent ion mass of a fragment ion spectrum. The new limits require a
peptide to be 6–50 residues in length, regardless of the parent ion mass.
-
The Windows version of the code has been updated and adapted for use with Microsoft Visual Studio Community 2015.
It has been fully tested for Windows 8, 8.1 & 10.
-
The Linux version of the code has been updated and adapted for use with Red Hat Enterprise Linux Workstation v.6.7,
using gcc v. 4.4.7.
-
This version was designed and tested to work with the BI GPM Fury version of the generic GPM interface.
|
TORNADO (2009.04.01) |
This release is a maintenance release that adds one new feature
that is designed to be used in analyzing SILAC experiments. It also has minor changes to improve the
detection of new input data file types.
|
- It is now possible to specify multiple sets of "complete" modifications to be applied
sequentially. This was achieved by adding new commands to the input X! file format that
look like the following:
<note type="input" label="residue, modification mass">57@C</note>
<note type="input" label="residue, modification mass 1">57@C,8@K,10@R</note>
In this case, the data would be checked both for peptides with only cysteine modified by carboxyamidomethyl and for
peptides with carboxyamidomethyl and SILAC labeled lysine and arginine residues. This applies to both the initial round of analysis
as well as all refinement rounds. Any number of sets of complete modifications
can be added, by incrementing the count in the label ("residue, modification mass 2", "residue, modification mass 3", etc.). Processing
stops when either a count increment label is missing (e.g., there is a residue, modification mass 2 label but no
residue, modification mass 3 label). Processing is also stopped with a zero length string is passed, for example the following string would stop processing at count = 1,
<note type="input" label="residue, modification mass 1"></note>
A non-zero length string that cannot be interpreted as a residue modification is interpretted as meaning that the data should be
analyzed with no residue modifications, for example:
<note type="input" label="residue, modification mass 1">none</note>, or
<note type="input" label="residue, modification mass 1"> </note>.
- Compatibility for version 1.1 of the CMN format has been added, allowing long description strings (> 255 characters).
- Detection of new, non-standard variants of mzXML files has been added.
|
2005.08.15 |
The changes in this release are aimed at
increasing XML compliance and high accuracy mass calculation consistency.
|
System level changes |
-
The handlers for GAML spectra, taxonomy files and input parameter files have to
changed to using expat, rather than custom routines.
-
A more flexible mass calculation class has been added to improve molecular mass
consistency for high accuracy calculations.
-
The input spectrum file type detection method has been improved by adding the
possibility of forcing it to select one file type. This forcing is done using
the input parameter "spectrum, path type" parameters, which can have
the values: dta, pkl, mgf, gaml, mzxml or mzdata. If this parameter is missing
or of zero length, the normal file type detection scheme is used.
|
2005.06.01 |
This version corrects an issue that could
arise in large MudPIT data sets with large numbers of redundant
identifications. The calculation of the protein expectation value in previous
versions was susceptible to floating point overflows when making this
calculation, resulting in unpredictable values.
|
2005.03.21 |
This release adds the ability to process
mzxml and mzdata
file formats using eXpat library of
functions. Most of the changes in this release were initially made by Patrick
Lacasse (Université Laval, Dept. of Medicine, supported by Genome Québec) with
the final version and optimizations made by Brendan MacLean, from the Fred
Hutchinson Cancer Research Center. Also, the ability to define the amino acid
residue masses has been added allowing users to change the default masses when
doing N15 experiments for example.
|
System level changes |
-
New classes have been added to allow the processing of mzxml and mzdata file
formats.
Two of the new classes are publicly derived from loadspectrum, a custom class
specific to Tandem. Two others are publicly derived from the xml parser class
SAXSpectraHandler which is imported from the expat library of functions. The
xml parser classes use the expat functions exclusively to parse the input in
order to load it into the traditional Tandem spectra data members.
-
base64.cpp and base64.h have been added to allow b64_decode_mio() function
calls, which are needed to decode the spectra in mzxml and mzdata spectra
files.
-
Included in the src folder is the libexpat.lib which is required to compile new
versions of the executable on Windows. Linux and OSX machines should have the
required libraries as part of the core operating system.
-
A new function has been added to msequtilites that allows amino acid residue
masses to be defined by an xml input file. If the parameter 'protein, modified
residue mass file' is defined in the input.xml, the masses are taken from the
file defined by that parameter. An example of the format can be viewed
here.
|
2005.02.01 |
This release contains modifications
necessary to insert new types of peptide scoring systems as well as to deal
effectively with high accuracy parent ion measurements, which are now available
in some types of mass spectrometers. Most of the changes in this release were
made by Brendan MacLean, from the Fred Hutchinson Cancer Research Center.
|
System level changes |
-
Several new classes have been added, to make the scoring system
"pluggable", i.e., it is now much easier to alter the scoring system
used, for the purposes of bioinformatics investigations. These changes are
mainly of interest to informatics professionals and they should not affect the
normal operation of the software for users.
-
The calculation of parent ion mass has been changed, taking more care as to the
mass of added groups and correctly accounting for electron masses.
-
Better statistical methods have been added to deal with the small number of
possible peptides generated from a list of protein sequences that have a very
high accurately determined parent ion mass.
|
2004.11.15.3 |
This release adds in several features
that were originally scheduled to appear in the 2004.11.15 release, but which
were pushed back from the initial release. The 2004.11.15.2 version was not
generally released. |
System level changes |
-
Spectra that are interpreted as being caused by a prompt neutral loss now have
the prompt loss specified in the appropriate <aa> node in the output.
-
Correction of an issue with the OS X version that resulted in improper reading
of ".pro" sequence files. Initially, the ".pro" format was
to have both little endian and big endian versions, however this became too
confusing to maintain. The current plan is to only use the little endian format
and to compensate for this on-the-fly in the OS X version.
-
The maximum parent ion charge to be used can now be specified using the
"spectrum, maximum parent charge" parameter. This parameter has a
default value of 4. This change was made necessary because of high charge
states being called by some MS peak assignment software, which caused spurious
assignments.
-
The first round of refinement (finding partially cleaved peptides) has been
extended, so that it possible to repeat it with different sets of modifications
and motifs. These additional refinement rounds are specified by adding
parameters using the following format:
-
Round 1: "refine, potential modification mass"
"refine, potential modification motif"
-
Round 2: "refine, potential modification mass 1"
"refine, potential modification motif 1"
-
Round 3: "refine, potential modification mass 2"
"refine, potential modification motif 2"
This will continue until both of the next pair of parameters are either missing
or neither contain an ampersand (@).
|
2004.07.15 |
This is a major release of TANDEM,
sufficiently different from previous releases to merit a major revision number:
this release will be referred to as TANDEM 2. |
System level changes |
-
The memory management throughout the program has been analyzed and altered to
minimize the amount of memory used per spectrum. This effort has reduced the
amount of memory used in single threaded operation by as much as 60%: the
improvement for double threaded operation may be as much as 80%.
-
The threading model has been changed to allow for the use of multiple
processeors in the refinement process. TANDEM 1 separated work between the
threads by dividing up the sequences to search, so that each thread would only
search a subset of the sequences in a FASTA file. TANDEM 2 divides up the mass
spectra between threads, so that each thread searches a subset of the mass
spectra. This change makes it easier to divide up the refinement job, but means
that running more than one thread on a single processor will degrade the
performance of the software. For best performance, it is now important to keep
the number of threads and the number of processors the same.
-
The refinement process has been improved in accuracy by applying a logical
filter after each step of refinement. This means that once a refinement step is
completed, the new results obtained from the refinement are examined and it the
new results are not significantly better than those obtained from a simpler
search, they are discarded and the simpler results retained. This filtering
significantly reduces the complexity of analyzing results when there may be a
variety of similar modification patterns or point mutations that explain a
particular spectrum.
-
Validation of results using reversed sequence databases has been built-in to
the search process. This validation may be turned on or off, using the new
input parameter "scoring, include reverse" (values = yes|no).
This validation process tabulates the number of unique high probability hits
from the reversed sequence search and places them in the output file, along
with estimates of the false positive rate based on TANDEM's stochastic
histogramming technique and the estimate derived from the reversed sequence
process. NOTE: When this validation method is used, twice as many sequences
must be processed (both forward and reversed), which may require significantly
more processing time.
-
Numerous small optimizations have been made, particularly for loading and
reporting the results for very large collections of mass spectra.
|
2003.06.01 |
This release introduces a new statistical
model for multiple model correlations. |
System level changes |
-
A new statistical interpretation was added, to combine expectation values when
multiple models from the same sequence are found to be the best model in
different spectra. Using this model, expectation values for the collections of
models are now listed as the base-10 log of the expectation value, beside the
FASTA description line.
-
The way FASTA description lines are listed has been changed. Rather than
listing the descriptions in the same order they were encountered in the search,
they are now listed by length: the longest entry first. The logic to this
choice is that for the NCBI database nr, the oldest entry for a similar
sequence tends to be the longest and the first line of that entry tends to have
the best description of the protein's common name. Unfortunately, this is not
always true.
-
A new way of organizing the output was added. It can be accessed by setting the output,
sort results by parameter value to protein. Models corresponding
to a given sequence are grouped together, with the best set of models at the
top of the page.
|
Corrected problems |
-
FASTA file name problem fixed.
-
Multiple modification reporting problem fixed.
|
Known problems |
-
No problems known at time of release
|