Methods and devices for proteomics data complexity reduction
First Claim
1. A method of reducing a number of peaks to further be analyzed in a mass spectrum for a sample, the method comprising:
- generating a first amino acid sequence database comprising an amino acid sequence of at least one protein known to be present in the sample;
calculating a first list of theoretical masses for a first set of in silico peptides generated from one or more of the amino acid sequences in the first database; and
correlating the first list of theoretical masses with positions of the unidentified MS peaks and identifying one or more MS peaks that correspond to masses for the in silico peptides, thereby reducing the number of peaks to further be analyzed in the mass spectrum.
1 Assignment
0 Petitions
Accused Products
Abstract
Provided are methods and systems for identification of proteins using high mass accuracy mass spectrometry. Not only do high mass accuracy measurements provide greater confidence in protein identification assignments, but they also enable proteins to be identified with either less sequence coverage or fewer additional tandem MS experiments. In addition, high mass measurement accuracy optionally allows protein identifications to be made on the basis of the mass of a single peptide, providing higher-throughputs in the analysis of mixtures due to the significant decrease in time spent on additional tandem MS experiments. In addition, a concomitant time saving in the cross correlation process of mass spectral data with in silico digested databases would also be achieved.
43 Citations
117 Claims
-
1. A method of reducing a number of peaks to further be analyzed in a mass spectrum for a sample, the method comprising:
-
generating a first amino acid sequence database comprising an amino acid sequence of at least one protein known to be present in the sample;
calculating a first list of theoretical masses for a first set of in silico peptides generated from one or more of the amino acid sequences in the first database; and
correlating the first list of theoretical masses with positions of the unidentified MS peaks and identifying one or more MS peaks that correspond to masses for the in silico peptides, thereby reducing the number of peaks to further be analyzed in the mass spectrum. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44)
-
-
45. A method of reducing a number of peaks to further be analyzed in a mass spectrum for a sample, the method comprising:
-
generating a first amino acid sequence database comprising an amino acid sequence of at least one protein present in the sample;
calculating a first list of theoretical masses for a first set of known in silico proteolytic peptides generated from the first database;
correlating a first theoretical mass with a position of an unidentified MS peak in a mass spectrum for the sample, thereby determining the presence in the sample of a first protein that comprises a peptide having a mass equal to the first theoretical mass; and
identifying one or more MS peaks that correspond to masses for the known in silico proteolytic peptides, thereby reducing the number of peaks to further be analyzed in the mass spectrum.
-
-
46. A method of identifying members of a plurality of proteins in a sample, the method comprising:
-
contacting a sample comprising a plurality of proteins with at least a first proteolytic agent that cleaves member proteins at defined cleavage sites to form proteolytic peptides;
contacting the sample with a first derivatizing agent comprising at least two isotopic forms, wherein the first derivatizing agent specifically labels a selected amino acid or functional moiety when the selected amino acid or functional moiety is present in a protein in the sample, thereby isotopically labeling one or more members of the plurality of proteins or proteolytic peptides;
fractionating the sample and depositing a plurality of fractions of an eluent onto a solid support suitable for LDI;
performing LDI-FT ICR mass spectrometry on the isotopically-labeled peptides in one or more of the fractions and determining masses of at least one pair of peaks of interest using a mass spectrometer that provides a mass accuracy of 5 ppm or better;
calculating a list of theoretical molecular masses for a plurality of in silico derivatized proteolytic peptides, wherein the member proteolytic peptides i) are derived from the amino acid sequences in a protein sequence database by predicted action of the proteolytic reagent upon members of the database;
ii) encompass peptides having up to three missed proteolytic cleavage sites;
iii) range in size between 1000 Da and 6000 Da; and
iv) comprise one or more derivatized amino acids; and
correlating the list of theoretical molecular masses to the mass peak list of experimental mass peaks, wherein a match between an experimental mass peak of a sample proteolytic peptide and a theoretical molecular mass for an in silico proteolytic peptide is indicative of the presence in the sample of the protein from which the in silico proteolytic peptide is derived, thereby assigning MS peaks in the mass peak list and identifying the members of the plurality of proteins. - View Dependent Claims (47, 48)
-
-
49. A method for identifying two or more members of a plurality of proteins in a sample, the method comprising:
-
a) providing a sample comprising a plurality of proteolytic polypeptides;
b) ionizing member polypeptides by LDI and obtaining a mass of at least a first polypeptide using a mass spectrometer that provides a mass accuracy of 5 ppm or better;
c) comparing the mass of the first polypeptide to members of a database of theoretical molecular masses for a plurality of in silico proteolytic peptides, wherein each member in silico peptide has a unique theoretical mass, and wherein a match between the mass obtained for the first polypeptide and the unique theoretical mass for an in silico proteolytic peptide indicates that a parent protein comprising the in silico polypeptide is present in the sample, thereby identifying a first protein in the sample; and
d) repeating the comparing step for one or more masses obtained for additional sample polypeptides, thereby identifying additional proteins in the sample. - View Dependent Claims (50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91)
-
-
92. A method for identifying two or more proteins in a sample, the method comprising:
-
a) contacting a sample that comprises a plurality of proteins with at least a first proteolytic reagent that cleaves proteins at defined cleavage sites to form sample proteolytic peptides;
b) subjecting at least a first proteolytic peptide to mass spectrometry to determine a mass of the first proteolytic peptide;
c) comparing the mass determined for the first proteolytic peptide to theoretical molecular masses for a plurality of in silico proteolytic peptides that are derived from amino acid sequences for a plurality of proteins, wherein a match between the mass determined for the first proteolytic peptide and the theoretical molecular mass for an in silico proteolytic peptide is indicative of the presence in the sample of the protein from which the in silico proteolytic peptide is derived;
d) calculating theoretical molecular masses for additional in silico proteolytic peptides derived from the protein identified in the comparison of the mass determined for the first proteolytic peptide to the theoretical molecular masses; and
e) repeating the comparing step for a mass obtained for a second proteolytic peptide, and disregarding mass spectral data for the second proteolytic peptide if the mass spectral data is within 5 ppm of that which would be obtained for one or more of the additional in silico proteolytic peptides from the previously identified protein. - View Dependent Claims (93, 94)
-
-
95. An integrated system for identifying a plurality of member proteins in a sample, the system comprising:
-
an ionization source and a mass spectrometer that provides a mass accuracy of 5 ppm or better;
an interface for receiving mass spectral data from the mass spectrometer, wherein the mass spectral data comprises mass peaks representing masses of a plurality of proteolytic peptides generated by treating the sample with at least a first proteolytic reagent;
a database of theoretical molecular masses of in silico-generated proteolytic peptides, wherein the peptides are derived by predicted action of the proteolytic reagent upon members of a database of protein sequences; and
a computer or computer-readable medium in communication with the interface and the database, the computer or computer-readable medium comprising instructions for determining a mass of a member proteolytic peptide from the mass spectral data and comparing the determined mass to members of the database of theoretical molecular masses, wherein a match between the mass determined for the proteolytic peptide and a theoretical molecular mass for an in silico proteolytic peptide is indicative of the presence in the sample of the protein from which the in silico proteolytic peptide is derived. - View Dependent Claims (96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117)
-
Specification