Method for sequencing nucleic acids with reduced errors

US 6,404,907 B1
Filed: 06/25/1999
Issued: 06/11/2002
Est. Priority Date: 06/26/1998
Status: Expired due to Fees

First Claim

Patent Images

1. A method for determining the sequence of a sample nucleic acid polymer comprising the steps of:

(a) obtaining forward and reverse data sets for forward and reverse strands of the sample nucleic acid, each data set containing a plurality of peaks reflecting the positions of A, C, G and T residues in the sample nucleic acid polymer, said data sets providing a migration time for each peak which migration time is related to the position of an A, C, G or T residue;

(b) determining the apparent sequence of bases from the forward and reverse data sets;

(c) comparing the apparent forward and reverse sequences of bases for perfect complementarity to identify any deviations from complementarity in the apparent sequences, any such deviation presenting a choice between two bases, only one of which is correct, and if a deviation is present identifying the peaks that are associated with the deviation;

(d) applying a confidence algorithm to peaks associated with a deviation to arrive at a numerical confidence value for those peaks; and

(e) comparing the numerical confidence values to each other and to a predetermined thresholds, and selecting as the correct base the base represented by the peak which has a better numerical confidence value, provided that the numerical confidence value is better than the threshold.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Nucleic acid polymers are sequenced by obtaining forward and reverse data sets for forward and reverse strands of a sample nucleic acid polymer. The apparent base sequences of these forward and reverse sets are determined and the apparent sequences are compared to identify any deviations from perfect complementarity. Any such deviation presents a choice between two bases, only one of which is correct. A confidence algorithm is applied to the peaks in the data sets associated with a deviation to arrive at a numerical confidence value for each of the two base choices. These confidence values are compared to each other and to a predetermined threshold, and the base represented by the peak with the better confidence value is assigned as the “correct” base, provided that its confidence value is better than the threshold. The confidence value takes into account at least one, and preferably more than one of several specific characteristics of the peaks in the data set that were not complementary.

189 Citations

9 Claims

1. A method for determining the sequence of a sample nucleic acid polymer comprising the steps of:
- (a) obtaining forward and reverse data sets for forward and reverse strands of the sample nucleic acid, each data set containing a plurality of peaks reflecting the positions of A, C, G and T residues in the sample nucleic acid polymer, said data sets providing a migration time for each peak which migration time is related to the position of an A, C, G or T residue;
  
  (b) determining the apparent sequence of bases from the forward and reverse data sets;
  
  (c) comparing the apparent forward and reverse sequences of bases for perfect complementarity to identify any deviations from complementarity in the apparent sequences, any such deviation presenting a choice between two bases, only one of which is correct, and if a deviation is present identifying the peaks that are associated with the deviation;
  
  (d) applying a confidence algorithm to peaks associated with a deviation to arrive at a numerical confidence value for those peaks; and
  
  (e) comparing the numerical confidence values to each other and to a predetermined thresholds, and selecting as the correct base the base represented by the peak which has a better numerical confidence value, provided that the numerical confidence value is better than the threshold.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the numerical confidence value results from a selected combination of two or more selected characteristics of each peak associated with a deviation.
  - 3. The method of claim 2 wherein the characteristics are selected from among the following:
    - separation between peaks;
      
      regularity/evenness of peak separation;
      
      peak height compared to neighbors, wherein a higher confidence is assigned if the peak heights are similar;
      
      peak area compared to neighbors, wherein a higher confidence is assigned if the peak areas are similar;
      
      distance to neighbors compared to the local average distance to neighbors;
      
      resolution of the peak, wherein a lower confidence is assigned for lower resolution; and
      
      signal to noise ratio in the region around the peak, wherein a lower confidence is assigned as the peak'"'"'s size is more similar to the noise level.
  - 4. The method of claim 2, wherein the numerical confidence value is a weighted combination of the selected characteristics.
  - 5. The method of claim 4 wherein weights applied to each of the selected characteristics are initially determined for a combination of chemistry and instrumentation a plurality of calibration runs performed using the given combination of chemistry and instrumentation.
  - 6. The method of claim 4, wherein weights applied to each of the selected characteristics are updated based upon accumulated data obtained when sequencing sample nucleic acid polymer.
  - 7. The method of claim 1, wherein the numerical confidence results from a combination of at least the following characteristics of each peak associated with a deviation:

8. An apparatus for sequencing a sample nucleic acid polymer comprising:
- (a) a data processor;
  
  (b) means for obtaining forward and reverse data sets for the forward and reverse strands of the sample nucleic acid polymer for processing by the data processor, each data set containing a plurality of peaks reflecting the positions of A, C, G and T residues in the sample nucleic acid polymer, said data sets providing a migration time for each peak which migration time is related to the position of an A, C, G or T residue; and
  
  (c) means for providing output of sequencing information from the data processor;
  
  wherein the data processor is operatively programmed to process the forward and reverse data sets by a method including the steps of;
  
  determining the apparent sequence of bases from the forward and reverse data sets;
  
  comparing the apparent forward and reverse sequences of bases for perfect complementarity to identify any deviations from complementarity in the apparent sequences, any such deviation presenting a choice between two bases, only one of which is correct, and if a deviation is present identifying the peaks that are associated with the deviation;
  
  applying a confidence algorithm to the peaks associated with a deviation to arrive at a numerical confidence value for those peaks; and
  
  comparing the numerical confidence values to each other and to a predetermined threshold, and selecting as the correct base the base represented by the peak which has a better numerical confidence value, provided that the numerical confidence value is better than the threshold.

9. A method for determining the sequence of a sample nucleic acid polymer comprising the steps of:
- (a) obtaining forward and reverse data sets for forward and reverse strands of the sample nucleic acid by multiple cycles of a primer extension reaction in which two labeled primers are extended in the presence of chain terminating nucleotides in a single reaction mixture, each data set containing a plurality of peaks reflecting the positions of A, C, G and T residues in the sample nucleic acid, said data sets providing a migration time for each peak which migration time is related to the position of an A, C, G or T residue;
  
  (b) determining the apparent sequence of bases from the forward and reverse data sets;
  
  (c) comparing the apparent forward and reverse sequences of bases for perfect complementarity to identify any deviations from complementarity in the apparent sequences, any such deviation presenting a choice between two bases, only one of which is correct, and if a deviation is present identifying the peaks that are associated with the deviation;
  
  (d) applying a confidence algorithm to the peaks associated with a deviation to arrive at a numerical confidence value for those peaks; and
  
  (e) comparing the numerical confidence values to each other and to a predetermined threshold, and selecting as the correct base the base represented by the peak which has a better numerical confidence value, provided that the numerical confidence value is better than the threshold.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Siemens Healthcare Diagnostics Incorporated (Siemens AG)
Original Assignee
Visible Genetics, Inc. (Bayer AG)
Inventors
Gilchrist, Rodney D., Dunn, James M.
Primary Examiner(s)
Allen, Marianne P.

Application Number

US09/345,613
Time in Patent Office

1,082 Days
Field of Search

435/6, 436/86, 436/89, 364/500, 392/129, 392/173, 392/190
US Class Current

382/129
CPC Class Codes

C12Q 1/6869   Methods for sequencing

G16B 30/00   ICT specially adapted for s...

G16B 30/10   Sequence alignment; Homolog...

Method for sequencing nucleic acids with reduced errors

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

189 Citations

9 Claims

Specification

Solutions

Use Cases

Quick Links

Method for sequencing nucleic acids with reduced errors

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

189 Citations

9 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links