Method and apparatus for analyzing data files derived from emission spectra from fluorophore tagged nucleotides
First Claim
1. An automated method for comparing a first DNA sequence and a second DNA sequence wherein the features of one or more peaks from each digital data file that is representative of a chromatogram generated during DNA sequencing are quantified and compared, comprising the steps of:
- a) Obtaining two digital data files each containing at least one peak;
b) Importing said data files into the memory of a digital computer device;
c) For each peak of each data file, extracting a feature vectors wherein three or more peak parameters are quantified;
d) synchronizing said digital data files to be compared based on the feature vectors extracted for each data file;
e) comparing the feature vectors of corresponding peaks in the synchronized data files; and
f) detecting differences in the feature vectors.
3 Assignments
0 Petitions
Accused Products
Abstract
An automated method and apparatus is provided for the analyzing of data files derived from fluorophore emissions detected during observation of fluorophore labeled nucleotide polymers such as is done during the sequencing of bases in nucleotide polymers. The analysis steps of the method depend upon a key step of quantifying features of the emission peaks whereby subsequent steps as base calling can be performed and whereby individual emission spectra within the data files of two or more samples can be automatically synchronized, compared, and differences detected and signaled. The quantified peak information provides for the use of fuzzy logic and the assignment of truth values or scores to be assigned to the base calls. optimally, individual peaks within the data file are corrected by distortions, the peaks enhanced, and the overall data file information augmented to further improve the accuracy of the analysis of the data and reduce manual labor requirements.
41 Citations
14 Claims
-
1. An automated method for comparing a first DNA sequence and a second DNA sequence wherein the features of one or more peaks from each digital data file that is representative of a chromatogram generated during DNA sequencing are quantified and compared, comprising the steps of:
-
a) Obtaining two digital data files each containing at least one peak;
b) Importing said data files into the memory of a digital computer device;
c) For each peak of each data file, extracting a feature vectors wherein three or more peak parameters are quantified;
d) synchronizing said digital data files to be compared based on the feature vectors extracted for each data file;
e) comparing the feature vectors of corresponding peaks in the synchronized data files; and
f) detecting differences in the feature vectors. - View Dependent Claims (2, 3, 7, 8, 9, 10, 11, 12, 13, 14)
Allowing a user to designate one of said digital data files as a reference and the remaining file or files as samples.
-
-
3. A method according to claim 2 wherein the differences are assigned truth values.
-
7. A method according to claim 1 wherein one feature vector is extracted selected from the group of features including peak shape, area under the peak, and the second time domain derivative of the peak.
-
8. A method according to claim 1 wherein the extracted feature vectors are derived from peak height, second time domain derivative, and peak shape.
-
9. A method according to claim 1 including a step of calling a base from said feature vector.
-
10. A method according to claim 1 including detecting the at least one peak, further comprising the step of correcting the peak information prior to detecting the peak.
-
11. A method according to claim 1 including detecting the at least one peak, further comprising the step of enhancing the peak information prior to detecting the peak.
-
12. A method according to claim 1 including detecting the at least one peak, further comprising the steps of correcting and enhancing the peak information prior to detecting the peak.
-
13. A method according to claim 2, wherein the two or more digital data files include data files from differing runs of the same nucleotide polymers.
-
14. A method according to claim 2, wherein the two or more digital data files include data files of different nucleotide polymers.
-
4. A digital computer device for use with at least two digital data files representing chromatograms from a first DNA sequence and a second DNA sequence generated during DNA sequencing, each data file containing at least one peak, said computer configured to:
-
a) import said digital data files into the memory of the digital computer device;
b) calculate a vector for each peak which quantifies three or more peak parameters;
c) synchronize said digital data files to be compared based on the feature vectors extracted for each data file;
d) compare the feature vectors of corresponding peaks in the synchronized data files; and
e) detect differences in the feature vectors. - View Dependent Claims (5, 6)
allow a user to designate one or more imported digital data files as samples and another digital data file as a reference.
-
-
6. A device according to claim 4, wherein the vector is calculated upon the extracted features of peak height, second time domain derivative, and peak shape.
Specification