Method for alignment of DNA sequences with enhanced accuracy and read length
First Claim
1. A method for assignment of base numbers to peaks within one or more experimental DNA sequencing data traces derived from the separation of experimental DNA sequencing fragments, comprising the steps of:
- (a) obtaining one or more reference DNA sequencing data traces derived from the separation of reference DNA sequencing fragments reflecting the position of at least one base in a reference polynucleotide of known sequence;
(b) evaluating the reference DNA sequencing data traces to determine a corrected time scale indicative of migration times at which peaks should occur;
(c) sampling the experimental DNA sequencing data trace(s) at time points determined by the corrected time scale, and (d) assigning a base number to each peak found in the experimental DNA sequencing data trace(s) based upon the corrected time scale.
4 Assignments
0 Petitions
Accused Products
Abstract
In order to align DNA sequence data traces, an experimental data trace representing the positions of a first species of base within a target polynucleotide and a reference data trace representing the positions of a second species of base (which may be the same as or different from the first species) within a reference polynucleotide are obtained by separating appropriate sequencing fragments generated from the target and reference polynucleotides on an electrophoresis gel. For each reference data trace, a plurality of peaks corresponding to fragments having a size in the range of 40 to 1200 bases are selected. A base number is assigned to each of the selected peaks in the reference data trace, and a numerical “peak file” is created with information about the peak number and migration time (or distance). This peak file is analyzed to determine a set of polynomial coefficients which will allow substantial linearization of a plot of peak number versus separation between adjacent peaks and alignment of the traces with respect to each other. These coefficients are used to create a corrected time scale identifying where peaks should be located on a given experimental gel. This corrected time scale is used to guide the sampling of the experimental data, and for assignment of peaks within the data.
18 Citations
22 Claims
-
1. A method for assignment of base numbers to peaks within one or more experimental DNA sequencing data traces derived from the separation of experimental DNA sequencing fragments, comprising the steps of:
-
(a) obtaining one or more reference DNA sequencing data traces derived from the separation of reference DNA sequencing fragments reflecting the position of at least one base in a reference polynucleotide of known sequence;
(b) evaluating the reference DNA sequencing data traces to determine a corrected time scale indicative of migration times at which peaks should occur;
(c) sampling the experimental DNA sequencing data trace(s) at time points determined by the corrected time scale, and (d) assigning a base number to each peak found in the experimental DNA sequencing data trace(s) based upon the corrected time scale. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
(i) identifying a plurality of peaks in the reference DNA sequencing data traces, and creating a data table containing the number of each peak based on the known sequence of the polynucleotide, and the position of each peak in the reference DNA sequencing data trace;
(ii) identifying a set of coefficients for a polynomial effective to substantially linearize a plot of peak number versus separation between adjacent peaks; and
(iii) creating from the coefficients and the polynomial a corrected time scale which reflects the positions at which a peak should occur at any given point in a sequencing data trace.
-
-
3. The method of claim 2, wherein the polynomial is a third or higher order polynomial.
-
4. The method of claim 2, wherein a defined number of bands are selected for evaluation from each of the reference DNA sequencing data traces.
-
5. The method of claim 4, wherein the defined number of bands selected is from 3 to 40.
-
6. The method of claim 4, wherein the defined number of bands is at least equal to the order of the polynomial, plus 1.
-
7. The method of claim 1, wherein the experimental DNA sequencing data traces and a first reference DNA sequencing data trace are derived from analysis of sequencing fragments in a common lane of a sequencing gel.
-
8. The method of claim 1, wherein a plurality of reference DNA sequencing data traces are obtained, each derived from the separation of the same set of reference DNA sequencing fragments.
-
9. The method of claim 1, wherein base numbers are assigned to peaks within a plurality of experimental DNA sequencing data trace derived from the separation of experimental DNA sequencing fragments indicative of the positions of a plurality of types of bases.
-
10. The method of claim 9, wherein base numbers are assigned to peaks within four experimental DNA sequencing data traces derived from the separation of experimental DNA sequencing fragments indicative of the positions of four types of bases.
-
11. A method for evaluating the sequence of a target polynucleotide, comprising the steps of:
-
(a) obtaining one or more experimental DNA sequencing data traces derived from the separation of experimental DNA sequencing fragments reflecting the position of at least one base in the target polynucleotide and one or more reference DNA sequencing data traces derived from the separation of reference DNA sequencing fragments reflecting the position of at least one base in a reference polynucleotide of known sequence;
(b) evaluating the reference DNA sequencing data traces to determine a corrected time scale indicative of migration times at which peaks should occur;
(c) sampling the experimental DNA sequencing data traces at time points determined by the corrected time scale, and (d) assigning a base number to each peak found in the experimental DNA sequencing data traces based upon the corrected time scale, thereby obtaining information about the sequence of the target polynucleotide. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
(i) identifying a plurality of peaks in the reference DNA sequencing data traces, and creating a data table containing the number of each peak based on the known sequence of the polynucleotide, and the position of each peak in the reference DNA sequencing data trace;
(ii) identifying a set of coefficients for a polynomial effective to substantially linearize a plot of peak number versus separation between adjacent peaks; and
(iii) creating from the coefficients and the polynomial a corrected time scale which reflects the positions at which a peak should occur at any given point in a sequencing data trace.
-
-
13. The method of claim 12, wherein the polynomial is a third or higher order polynomial.
-
14. The method of claim 12, wherein a defined number of bands are selected for evaluation from each of the reference DNA sequencing data traces.
-
15. The method of claim 14, wherein the defined number of bands selected is from 3 to 40.
-
16. The method of claim 14, wherein the defined number of bands is at least equal to the order of the polynomial, plus 1.
-
17. The method of claim 11, wherein the reference DNA sequencing traces and the experimental DNA sequencing data traces are derived from analysis of sequencing fragments in a common sequencing gel.
-
18. The method of claim 17, wherein the experimental DNA sequencing data traces and a first reference DNA sequencing data trace are derived from analysis of sequencing fragments in a common lane of the common sequencing gel.
-
19. The method of claim 11, wherein a plurality of reference DNA sequencing data traces are obtained, each derived from the separation of the same set of reference DNA sequencing fragments.
-
20. The method of claim 11, wherein base numbers are assigned to peaks within a plurality of experimental DNA sequencing data traces derived from the separation of experimental DNA sequencing fragments indicative of the positions of a plurality of types of bases.
-
21. An apparatus for evaluating the sequence of a target polynucleotide, comprising:
-
(a) an input for receiving information about one or more experimental DNA sequencing data traces derived from the separation of experimental DNA sequencing fragments reflecting the position of at least one base in the target polynucleotide and one or more reference DNA sequencing data traces derived from the separation of reference DNA sequencing fragments reflecting the position of at least one base in a reference polynucleotide of known sequence;
(b) a processor, operatively programmed to evaluate the reference DNA sequencing data traces to determine a corrected time scale indicative of migration times at which peaks should occur;
(c) a processor, operatively programed to sample the experimental DNA sequencing data traces at time points determined by the corrected time scale;
(d) a processor, operatively programmed to assign a ba number to each peak found in the experimental DNA sequencing data traces based upon the corrected time scale, thereby obtaining information about the sequence of the target polynucleotide; and
(e) an output for communicating the information about the sequence of the target polynucleotide. - View Dependent Claims (22)
(i) identifying a plurality of peaks in the reference DNA sequencing data traces, and creating a data table containing the number of each peak based on the known sequence of the polynucleotide, and the position of each peak in the reference DNA sequencing data trace;
(ii) identifying a set of coefficients for a polynomial effective to substantially linearize a plot of peak number versus separation between adjacent peaks; and
(iii) creating from the coefficients and the polynomial a corrected time scale which reflects the positions at which a peak should occur at any given point in a sequencing data trace.
-
Specification