Method for sequencing nucleic acids with reduced errors
First Claim
1. A method for determining the sequence of a sample nucleic acid polymer comprising the steps of:
- (a) obtaining forward and reverse data sets for forward and reverse strands of the sample nucleic acid, each data set containing a plurality of peaks reflecting the positions of A, C, G and T residues in the sample nucleic acid polymer, said data sets providing a migration time for each peak which migration time is related to the position of an A, C, G or T residue;
(b) determining the apparent sequence of bases from the forward and reverse data sets;
(c) comparing the apparent forward and reverse sequences of bases for perfect complementarity to identify any deviations from complementarity in the apparent sequences, any such deviation presenting a choice between two bases, only one of which is correct, and if a deviation is present identifying the peaks that are associated with the deviation;
(d) applying a confidence algorithm to peaks associated with a deviation to arrive at a numerical confidence value for those peaks; and
(e) comparing the numerical confidence values to each other and to a predetermined thresholds, and selecting as the correct base the base represented by the peak which has a better numerical confidence value, provided that the numerical confidence value is better than the threshold.
3 Assignments
0 Petitions
Accused Products
Abstract
Nucleic acid polymers are sequenced by obtaining forward and reverse data sets for forward and reverse strands of a sample nucleic acid polymer. The apparent base sequences of these forward and reverse sets are determined and the apparent sequences are compared to identify any deviations from perfect complementarity. Any such deviation presents a choice between two bases, only one of which is correct. A confidence algorithm is applied to the peaks in the data sets associated with a deviation to arrive at a numerical confidence value for each of the two base choices. These confidence values are compared to each other and to a predetermined threshold, and the base represented by the peak with the better confidence value is assigned as the “correct” base, provided that its confidence value is better than the threshold. The confidence value takes into account at least one, and preferably more than one of several specific characteristics of the peaks in the data set that were not complementary.
189 Citations
9 Claims
-
1. A method for determining the sequence of a sample nucleic acid polymer comprising the steps of:
-
(a) obtaining forward and reverse data sets for forward and reverse strands of the sample nucleic acid, each data set containing a plurality of peaks reflecting the positions of A, C, G and T residues in the sample nucleic acid polymer, said data sets providing a migration time for each peak which migration time is related to the position of an A, C, G or T residue;
(b) determining the apparent sequence of bases from the forward and reverse data sets;
(c) comparing the apparent forward and reverse sequences of bases for perfect complementarity to identify any deviations from complementarity in the apparent sequences, any such deviation presenting a choice between two bases, only one of which is correct, and if a deviation is present identifying the peaks that are associated with the deviation;
(d) applying a confidence algorithm to peaks associated with a deviation to arrive at a numerical confidence value for those peaks; and
(e) comparing the numerical confidence values to each other and to a predetermined thresholds, and selecting as the correct base the base represented by the peak which has a better numerical confidence value, provided that the numerical confidence value is better than the threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7)
separation distance between peaks;
regularity/evenness of peak separation;
peak height compared to neighbors;
peak area compared to neighbors;
distance to neighbors compared to the local average distance to neighbors;
resolution of the peak; and
signal-to-noise ratio in the region around the peak.
-
-
8. An apparatus for sequencing a sample nucleic acid polymer comprising:
-
(a) a data processor;
(b) means for obtaining forward and reverse data sets for the forward and reverse strands of the sample nucleic acid polymer for processing by the data processor, each data set containing a plurality of peaks reflecting the positions of A, C, G and T residues in the sample nucleic acid polymer, said data sets providing a migration time for each peak which migration time is related to the position of an A, C, G or T residue; and
(c) means for providing output of sequencing information from the data processor;
wherein the data processor is operatively programmed to process the forward and reverse data sets by a method including the steps of;
determining the apparent sequence of bases from the forward and reverse data sets;
comparing the apparent forward and reverse sequences of bases for perfect complementarity to identify any deviations from complementarity in the apparent sequences, any such deviation presenting a choice between two bases, only one of which is correct, and if a deviation is present identifying the peaks that are associated with the deviation;
applying a confidence algorithm to the peaks associated with a deviation to arrive at a numerical confidence value for those peaks; and
comparing the numerical confidence values to each other and to a predetermined threshold, and selecting as the correct base the base represented by the peak which has a better numerical confidence value, provided that the numerical confidence value is better than the threshold.
-
-
9. A method for determining the sequence of a sample nucleic acid polymer comprising the steps of:
-
(a) obtaining forward and reverse data sets for forward and reverse strands of the sample nucleic acid by multiple cycles of a primer extension reaction in which two labeled primers are extended in the presence of chain terminating nucleotides in a single reaction mixture, each data set containing a plurality of peaks reflecting the positions of A, C, G and T residues in the sample nucleic acid, said data sets providing a migration time for each peak which migration time is related to the position of an A, C, G or T residue;
(b) determining the apparent sequence of bases from the forward and reverse data sets;
(c) comparing the apparent forward and reverse sequences of bases for perfect complementarity to identify any deviations from complementarity in the apparent sequences, any such deviation presenting a choice between two bases, only one of which is correct, and if a deviation is present identifying the peaks that are associated with the deviation;
(d) applying a confidence algorithm to the peaks associated with a deviation to arrive at a numerical confidence value for those peaks; and
(e) comparing the numerical confidence values to each other and to a predetermined threshold, and selecting as the correct base the base represented by the peak which has a better numerical confidence value, provided that the numerical confidence value is better than the threshold.
-
Specification