Skip to main content

LC-MS Characterization of mRNA: 5′ Caps, PolyA Tails, and sequence confirmation

5′ mRNA caps and polyA tails play pivotal roles in translocation, translation, and recruiting complexes for RNA processing as well as preventing degradation of the mRNA species. Structural characterization of the 5′ mRNA cap species and polyA tails is therefore an important process in the deployment of mRNA in biopharmaceutical R&D, diagnostic, or therapeutic applications.

Novatia utilizes liquid chromatography high resolution mass spectrometry and novel deisotoping and charge deconvolution software for the characterization of mRNA caps and polyA tails. Click on the blue highlighted topics below to explore further.

LC-MS analysis of 5' capped mRNA-Click Here

Recent publications from New England Biolabs highlight our LC-MS analysis of 5′ capped mRNA: Aug 2022Oct 2022Oct 2023

Full length intact mRNA is too large and heterogeneous to be directly analyzed by LC-MS. In order to characterize the 5′ cap by mass spectrometry, the 5′ capped end must be removed from the rest of the mRNA. Our LC-MS methods for the analysis of 5′ capped mRNA are optimal for sequences that are ~13-23 nucleotides (nts) long. The two most common methods of 5′ cleavage are (1) the use of a DNA probe and RNase H which cleave RNA in a DNA:RNA duplex or (2) the use of a deoxyribozyme. If designed properly, a relatively uniform cut site on the mRNA will be achieved resulting in a sequence length that is optimal for our LC-MS methods. Novatia does not design or perform cleavage of the mRNA for 5′ cap analysis. This activity is something that will need to be done by the client prior to sending the samples to Novatia for LC-MS analysis.

In order to accurately analyze and process the data, we will need the following information:

  • The method used to cap the mRNA, i.e., whether it was enzymatically capped or incorporated during transcription.
  • What cap is being added, e.g., m7G (monomethyl-G), m7Gm (di-methyl G)
  • If other portions of the mRNA are modified, e.g., m1Y (1-methylpseudouridine) in place of U (uridine) or if the first and/or second base in the sequence contain a 2’O-methyl.
  • The expected mRNA sequence for the sample being analyzed. In the comments section of the Excel submission sheet, you can also include additional information like the +1, +2 nt species to account for possible non-uniform cleavage of the RNA.
  • The method used to cleave the mRNA sequence (e.g.,: RNase H or deoxyribozyme). If using RNase H, providing the probe sequence will also allow us to identify the probe and related species in the sample and include them in the analysis report.

The information listed above will be used to complete the data processing and identify possible capped and uncapped species for the provided mRNA sequence. Once this is completed, you will receive your results as a ProMass report which will contain detailed information for identified species based on the information that was provided.

To view pricing for our routine LC-MS analysis of capped mRNA (Excel code HRMS_LCMS) click here.

If you have questions about this routine service contact us at

LC-MS analysis of mRNA PolyA tails-Click Here

Full length intact mRNA is too large and heterogeneous to be directly analyzed by LC-MS. In order to analyze the PolyA tail of an mRNA, it must be cleaved from the full length mRNA. The most common way to accomplish this is via enzymatic digestion with RNase T1, which will only cleave the 3′ end of rG and will leave the PolyA sequence intact. Our current methods are well suited for analyzing PolyA tail species up to 150 nucleotides long.

It is strongly recommended to provide information about the length of the PolyA tail sample so that we can accurately analyze and process the data. Providing the sequence including the nucleotides after the last rG preceding the PolyA tail and, if applicable, any non-rA nucleotides that interrupt the polyA tail or come afterwards will allow us to calculate a theoretical target mass.

The information listed above will be used to complete the data processing and identify PolyA species in the provided sample. Once this is complete, you will receive your results as a ProMass report, which will contain detailed information for identified species based on the information that was provided.

Example data for a PolyA tail analysis can be viewed here. A walkthrough of the data is also provided below.

The top level HTML summary page of each ProMass data set contains hyperlinks to a detailed report for each sample analyzed. At the top of each sample report, a concise summary of relevant information gleaned from the analysis is shown. Theoretical average and monoisotopic masses of the provided sequence(s) are shown along with a Target Mass Summary Table, which lists identified species based on the observed average masses of the provided target sequence. Because polyA tails are usually nonuniform in length, any additional polyA tail species identified will appear in the target mass summary as “plus/minus X rA”, where X denotes the number of rA nucleotides longer or shorter than the target sequence.

The summary table contains hyperlinks to the corresponding spectra for the species that were identified. Clicking on the hyperlinks in the target mass summary table will take you to the deconvoluted mass spectrum (below). Clicking the [View Data] hyperlink will above that spectrum will display an interactive data viewer that will allow you to zoom or pan the mass spectra.

Clicking on the hyperlinked [Deconvolution Peak Report] will display a table of masses observed in that spectrum. A listing of presumed identities will be shown based on the information that was provided in the submission.

When provided with the correct information, detailed above, it is possible to determine the most abundant length, the weighted average, and the range of lengths of polyA species present in the sample (Note: The weighted average is not included in the ProMass report, but was manually calculated based on the species distribution).

To view pricing for our routine LC-MS analysis of polyA tails (Excel code HRMS_LCMS_PolyAtail) click here.

If you have questions about this routine service contact us at

LC-MS/MS sequence confirmation of mRNA-Click Here

Using a bottom-up LC-MS/MS strategy as is typically employed in peptide mapping, it is possible to confirm the sequence of the open reading frame (ORF) of full length mRNA strands. This is accomplished with RNase T1 enzymatic digestion, which cleaves on the 3′ end of rG in a sequence and leaves a 3′ phosphate. RNase T1 digestion results in shorter RNA sequences that are more readily analyzed by LCMS. The use of UPLC enables the chromatographic separation of a majority of the species. The use of data dependent MS/MS scans further supports the identification of these species as well as differentiates isobaric sequences.

An LC-MS/MS analysis of eGFP mRNA digested by RNase T1 reveals a information rich chromatogram shown below. As RNA is comprised of only 4 different residues (rC, rU, rA, rG), it is expected that there will be several sequences that are the same after a RNase T1 digest. These ambiguous sequences (shown in grey in the figure below) still provide sequence coverage, but are less valuable in the confirmation of the sequence. More important for sequence confirmation are the digestion products of eGFP mRNA that are unique sequences (shown in Green).

These unique sequences may still be isobaric (i.e., the same base composition, but different sequences), thus high resolution accurate mass measurements are insufficient to differentiate them. In the example of eGFP mRNA, the use of accurate mass measurements alone would only allow one to identify unique sequences covering 39% (279 nt/ 714 nt) of the sequence. However, high resolution MS/MS can be used to discern the the sequence identities of isobaric species, as shown in the figure below for two sequences that are isobaric. MS/MS data analysis increases the expected sequence coverage from unique sequences in eGFP mRNA to 55% ( 396 nt/ 714 nt) providing a higher level of confidence in sequence confirmation.

The LC-MS/MS strategy enables the sequence confirmation of the entire ORF of eGFP mRNA as shown below with unique sequences (indicated by Green bars) and sequences that appear in multiple locations (indicated by Yellow bars). The cut sites for RNase T1 are indicated in the sequence with rG.

The LC-MS/MS strategy identified all 53 expected unique sequences covering a total of 55% of the target sequence, which is the maximum amount that could be observed. The rest of the expected sequence was comprised of ambiguous sequences that could not be differentiated by MS/MS but the inclusion of them provided 100% sequence coverage confirming the eGFP mRNA sequence.

Please email to inquire about this type of project work.