ProMass can automatically calculate the theoretical molecular masses of peptide, protein, and oligonucleotide sequences as well as molecular formula and display them in the web-based results report. ProMass will automatically evaluate the mass of the BioSequence string and determine if the target mass was found in the data set. This is very useful if you want to confirm the masses of known components. There are two options for entering biomolecule sequence information to have ProMass automatically calculate molecular weight. You can either enter the sequence or a sequence file in the BioSequence field of the Xcalibur Sequence Setup window as described below:
First make sure that the BioSequence and Target Info fields are present in your Xcalibur Sequence Setup window. If they are not, read this first.
Specifying a text string: If your
sequence is less than 255 characters in length, you can paste the text
string directly in the BioSequence field in the Xcalibur sample list. All
you need to do is paste your sequence in the BioSequence field in the
Xcalibur Sequence Setup window. The sequence string should use the
appropriate letter codes for either amino acids or nucleotides. Copy your
complete sequence to the clipboard. Double-click on the BioSequence field
and right click your mouse to be able to paste the sequence into the Sequence
Setup window. You can enter multiple sequences by separating the
sequences with a comma, e.g., ACTGA, TTGAC.
Specifying a sequence file: ProMass can
also read a text-based sequence file. This is particularly useful if your
sequence is longer than 255 characters. Simply specify a text file name in the
BioSequence field, including the path, which contains your sequence
(e.g., C:\Xcalibur\sequence\myoglobin.pep). In order for ProMass to
recognize the file, the text file should have the extension .txt,
.pep, or .fasta. ProMass will automatically ignore lines in the
file that begin with the characters: >, #, or '. This
feature allows you to read fasta format sequences. Once you have specified a sequence or file, you also need to instruct
ProMass what type of sequence is represented. The sequence type is entered in
the Target Info field of the Xcalibur Sequence Setup
program.
Specifying a peptide or protein
sequence: To specify a peptide or protein sequence, enter one of
the following in the Target Info field:
sequence = peptide
sequence = protein
sequence = amino acid
Specifying an oligo sequence:
To specify an oligonucleotide sequence, enter the following in the Target
Info field:
sequence = oligo
ProMass uses the Integrated DNA Technologies' (IDT) base notation format for oligonucleotide sequences. The IDT base notation format allows you to enter sequences of DNA, RNA, phosphorothioated DNA or RNA, O-methylated RNA, 2' fluoro-RNA, or locked nucleic acid (LNA) residues, including mixed sequences incorporating these residues. You can also enter duplexes by separating the two sequences with commas in the BioSequence field. If sequence = duplex is entered in the Target Info field, ProMass will also calculate the sum of the two masses from the two oligonucleotide sequences and treat this mass as an additional target mass, in addition to the target masses calculated from the individual strands. If you enter sequence = duplex in the Target Info field, the sequence type is automatically assumed to be an oligonucleotide.
The following are used for base notations in oligo sequences:
DNA = A, C, G, T, U, I
Phosphorothioated DNA = A*, G*, C*, T*, U*, I*
RNA = rA, rG, rC, rU
Phosphorothioated RNA = rA*, rG*, rC*, rU*
2'O-Methyl RNA = mA, mG, mC, mU
Phosphorothioated 2'O-Methyl RNA = mA*, mG*, mC*, mU*
2' fluoro RNA = fA, fG, fC, fU
Phosphorothioated 2' fluoro RNA = fA*, fG*, fC*, fU*
Locked Nucleic Acid (LNA) = +A, +G, +C, +T
Phosphorothioated Locked Nucleic Acid (LNA) = +A*, +G*, +C*, +T*
Modified residues, are entered as an alphanumeric string between forward slashes, e.g., /5Phos/ for the definition of a 5' phosphate modification. There are many modifed nucleotide groups already defined in the znova_masses.ini< mass definition file as discussed below.
For more information see http://www.idtdna.com/analyzer/Applications/OligoAnalyzer/
Specifying a
molecular formula: To specify a molecular formula, enter the
following in the Target Info field:
sequence = formula
sequence = molecule
A molecular formula may be entered like the following example: C6H12O6Br.
You must specify one of the sequence types as shown above or ProMass will not calculate masses. For peptides and proteins, you may also specify termini in the Target Info field. Normally, ProMass assumes that H, and OH must be added to the sequence to calculate a mass of a polypeptide. Therefore, you do not have to specify termini if for example your peptide contains a free amino N-terminus and a free acid at the C-terminus. Therefore, entering no termini options is the same as entering:
termini = H, OH
ProMass uses a text file to store the masses of the amino acid, nucleotide, termini, and custom groups. The text file is called znova_masses.ini and it can be found in your ZNova install directory (e.g., C:\Program Files\ProMassXcali\ZNova\znova_masses.ini). In our example above, the text strings 'H' and 'OH' have been pre-defined in the znova_masses.ini file to be equal to the masses of a proton and a hydroxyl group, respectively. You can create your own amino acid or nucleotide groups by editing the znova_masses.ini file. Before specifying termini, make sure they have been defined in the znova_masses.ini file. Additional information about the znova_masses.ini file is also available in the ZNova Mass Configuration File help topic.
A BioSequence string may also contain what is known as user residues. User residues are residues that are not present in the znova_masses.ini file, but are defined by entering their information directly in the Target Info field of the sample list. For example, your BioSequence string could contain the string: ACTG/MyMod/GATAC. In the Target Info field of the sample list you would give the MW of the modification named MyMod like this: /MyMod/ = 305.3. This allows you the convenience of defining custom residues in real time without having to edit the znova_masses.ini file. If the modified residue is not already defined in the znova_masses.ini file, you must either edit the znova_masses.ini file to add the modification, or define the MW of the residue as a user residue as described above, otherwise the masses will not be calculated correctly.
As an example, we'll use the myoglobin LC/MS file from the Getting Started example. Set up the Xcalibur sequence as described previously in the Getting Started section with the myolcmsdata.raw data file and test.pmd processing method from the ProMass TestData directory. The amino acid sequence of horse myoglobin is shown below:
GLSDGEWQQVLNVWGKVEADIAGHGQEVLIRLFTGHPETLEKFDKFKHLKTEAEMKASEDLKKHGTVVLTALGGILKKKGHHEAELKPLAQSHATKHKIPIKYLEFISDAIIHVLHSKHPGDFGADAQGAMTKALELFRNDIAAKYKELGFQG
Paste this entire sequence in the BioSequence field in the Xcalibur Sequence Setup program. (Note: Xcalibur will terminate input of any field at a carriage return, so you may want to check that the whole sequence is entered. If not, copy the above sequence to a notepad, remove the line-ending character, then copy/paste into the Xcalibur Sequence Setup). Also enter:
sequence = protein
in the Target Info field. Entry of the termini = H, OH in the Target Info field is optional. When you're done, your Xcalibur Sequence Setup should look something like this:
Select Row 1 with the mouse, and hit the Batch Reprocess button in Xcalibur Sequence Setup. Make sure the following options are checked in the batch reprocess dialog box: Qual - Peak detection and integration, Programs, and Replace Sample Info.
When the file is finished processing, display the resulting ProMass summary file in your browser. It should look something like the web page shown below. In the ProMass Browser summary you should see an entry for the myolcmsdata raw file and the first 50 amino acids of the myoglobin sequence in the Sample Comments field of the summary report. Note also that ProMass has automatically considered the calculated mass from the BioSequence string a Target Mass. The Green Result Status confirms that a mass within the user specified Mass Tolerance has been found in the data set.
Click on the sample row to navigate to the chromatogram-level summary for this data file. Note how the myoglobin sequence appears in the report along with the calculated average and monoisotopic masses.
Due to the character length limitation in the BioSequence string, if you need to confirm the masses of larger biomolecules (>255 residues), you will need to specify a text-based sequence file containing your sequence or explicitly define your Target Mass as described in Configuring ProMass to Search for Target Masses.
Related topics: