Methods and compositions for base calling nucleic acids
First Claim
1. A method for determining an identity of a nucleic acid at an interrogation position in a nucleotide sequence from data acquired from one or more channels in parallel, comprising:
- a) sequencing a nucleic acid by DNA sequencing by synthesis utilizing nucleotides with fluorescent dyes which are incorporated into a complementary strand,b) obtaining a data set for one or more dye intensities at one or more nucleic acid positions in said complementary sequence, wherein each dye corresponds to a nucleotide,c) determining the ratio contribution to dye intensity at said interrogation position from dye intensities at the interrogation position and at one or both ofi) at least one subsequent nucleic acid positions in said complementary sequence, andii) at least one preceding nucleic acid positions in said complementary sequence, andd) applying said ratio contribution to said data set to arrive at an identity for a nucleotide at said interrogation position in said nucleotide sequence.
4 Assignments
0 Petitions
Accused Products
Abstract
The invention provides methods and compositions, including, without limitation, algorithms, computer readable media, computer programs, apparatus, and systems for determining the identity of nucleic acids in nucleotide sequences using, for example, data obtained from sequencing by synthesis methods. The methods of the invention include correcting one or more phenomena that are encountered during nucleotide sequencing, such as using sequencing by synthesis methods. These phenomena include, without limitation, sequence lead, sequence lag, spectral crosstalk, and noise resulting from variations in illumination and/or filter responses.
23 Citations
38 Claims
-
1. A method for determining an identity of a nucleic acid at an interrogation position in a nucleotide sequence from data acquired from one or more channels in parallel, comprising:
-
a) sequencing a nucleic acid by DNA sequencing by synthesis utilizing nucleotides with fluorescent dyes which are incorporated into a complementary strand, b) obtaining a data set for one or more dye intensities at one or more nucleic acid positions in said complementary sequence, wherein each dye corresponds to a nucleotide, c) determining the ratio contribution to dye intensity at said interrogation position from dye intensities at the interrogation position and at one or both of i) at least one subsequent nucleic acid positions in said complementary sequence, and ii) at least one preceding nucleic acid positions in said complementary sequence, and d) applying said ratio contribution to said data set to arrive at an identity for a nucleotide at said interrogation position in said nucleotide sequence. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
26. A method for processing data for nucleic acids in a nucleotide sequence, wherein said data is acquired from one or more channels in parallel, the method comprising:
-
a) sequencing a nucleic acid by DNA sequencing by synthesis utilizing nucleotides with fluorescent dyes which are incorporated into a complementary strand, b) determining the ratio contribution to intensity in said one or more channels for one or more interrogation positions, from dye intensities at the interrogation position and at one or both of i) at least one subsequent nucleic acid positions in said complementary sequence, and ii) at least one preceding nucleic acid positions in said complementary sequence, c) processing data from said one or more channels to correct for sequence lead and sequence lag, and d) reconstructing said data in said one or more channels. - View Dependent Claims (27, 28, 29, 30, 31, 32)
-
-
33. An apparatus that processes data for nucleic acids in a nucleotide sequence to determine an identity of a nucleic acid at an interrogation position in said nucleotide sequence in parallel DNA sequencing by synthesis utilizing nucleotides with fluorescent dyes, said apparatus comprising
a) means for inputting data from one or more channels for one or more intensities, wherein each channel corresponds to a dye, and each dye corresponds to a nucleic acid base, b) means for applying to the input data a sequence lead-lag compensation equation to correct for sequence lead and sequence lag, c) means for comparing intensities in said one or more channels that have been corrected for sequence lead and sequence lag, d) means for determining the highest intensity of the compared dye intensities, and e) means for identifying a nucleic acid base at said interrogation position according to said highest intensity.
-
36. A system for processing data to determine an identity of a nucleic acid at an interrogation position in a nucleotide sequence in parallel DNA sequencing by synthesis utilizing fluorescent nucleotides, said system comprising
a) a processor, and b) a computer readable medium readable by said processor, said computer readable medium storing a computer program that comprisescode that receives as input a plurality of nucleic acid base calls at an interrogation position in a nucleotide sequence, code that applies to the input data a sequence lead-lag compensation equation to correct for sequence lead and sequence lag, andcode that identifies a nucleic acid at said interrogation position according to the corrected data.
Specification