Methods And Compositions For Base Calling Nucleic Acids
First Claim
1. A method for determining an identity of a nucleic acid at an interrogation position in a nucleotide sequence from data acquired from one or more channels, comprisinga) obtaining a data set for one or more probe intensities at one or more nucleic acid positions in said sequence, wherein each probe corresponds to a nucleic acid,b) determining the ratio contribution to probe intensity at said interrogation position from probe intensities at the interrogation position and at one or both ofi) at least one subsequent nucleic acid positions in said sequence, andii) at least one preceding nucleic acid positions in said sequence, andc) applying said ratio contribution to probe intensity to said data set to arrive at an identity for a nucleic acid at said interrogation position in said nucleotide sequence.
4 Assignments
0 Petitions
Accused Products
Abstract
The invention provides methods and compositions, including, without limitation, algorithms, computer readable media, computer programs, apparatus, and systems for determining the identity of nucleic acids in nucleotide sequences using, for example, data obtained from sequencing by synthesis methods. The methods of the invention include correcting one or more phenomena that are encountered during nucleotide sequencing, such as using sequencing by synthesis methods. These phenomena include, without limitation, sequence lead, sequence lag, spectral crosstalk, and noise resulting from variations in illumination and/or filter responses.
-
Citations
52 Claims
-
1. A method for determining an identity of a nucleic acid at an interrogation position in a nucleotide sequence from data acquired from one or more channels, comprising
a) obtaining a data set for one or more probe intensities at one or more nucleic acid positions in said sequence, wherein each probe corresponds to a nucleic acid, b) determining the ratio contribution to probe intensity at said interrogation position from probe intensities at the interrogation position and at one or both of i) at least one subsequent nucleic acid positions in said sequence, and ii) at least one preceding nucleic acid positions in said sequence, and c) applying said ratio contribution to probe intensity to said data set to arrive at an identity for a nucleic acid at said interrogation position in said nucleotide sequence.
-
26. The method of claim 26, wherein said equation is solved to determine spectral crosstalk matrix K−
- 1 using equation
-
27. An algorithm for processing data for nucleic acids in a nucleotide sequence, wherein said data is acquired from one or more channels, the algorithm comprising
a) determining the ratio contribution to probe intensity in said one or more channels for one or more interrogation positions, from probe intensities at the interrogation position and at one or both of iv) at least one subsequent nucleic acid positions in said sequence, and v) at least one preceding nucleic acid positions in said sequence, b) processing data from said one or more channels to correct for sequence lead and sequence lag, and c) reconstructing said data in said one or more channels.
-
35. A computer program product for processing data for nucleic acids in a nucleotide sequence to determine an identity of a nucleic acid at an interrogation position in said nucleotide sequence, said computer program product comprising
a) computer code that inputs data from one or more channels for one or more probe intensities, wherein each channel corresponds to a probe, and each probe corresponds to a nucleic acid, b) computer code that applies to the input data a sequence lead-lag compensation equation to correct for sequence lead and sequence lag, c) computer code that compares probe intensities in said one or more channels that have been corrected for sequence lead and sequence lag, d) computer code that determines the highest probe intensity of the compared probe intensities, and e) computer code that identifies a nucleic acid at said interrogation position according to said highest probe intensity.
-
38. An apparatus that processes data for nucleic acids in a nucleotide sequence to determine an identity of a nucleic acid at an interrogation position in said nucleotide sequence, said apparatus comprising
a) means for inputting data from one or more channels for one or more probe intensities, wherein each channel corresponds to a probe, and each probe corresponds to a nucleic acid, b) means for applying to the input data a sequence lead-lag compensation equation to correct for sequence lead and sequence lag, c) means for comparing probe intensities in said one or more channels that have been corrected for sequence lead and sequence lag, d) means for determining the highest probe intensity of the compared probe intensities, and e) means for identifying a nucleic acid at said interrogation position according to said highest probe intensity.
-
41. A system for processing data to determine an identity of a nucleic acid at an interrogation position in said nucleotide sequence, said system comprising
a) a processor, and b) a computer readable medium readable by said processor, said computer readable medium storing a computer program that comprisescode that receives as input a plurality of nucleic acid base calls at an interrogation position in a nucleotide sequence, code that applies to the input data a sequence lead-lag compensation equation to correct for sequence lead and sequence lag, andcode that identifies a nucleic acid at said interrogation position according to the corrected data.
-
44. A method for field flattening an image of a probe on a solid support, comprising
a) obtaining a first data set for a plurality of pixel intensities of a first raw image of a probe at a first concentration on a solid support, wherein said first raw image is produced using a first spectral filter for detecting a first probe, b) obtaining a second data set for a plurality of pixel intensities of a second smoothed image of said probe on said solid support, wherein said second smoothed image is produced using a low-pass filter, c) determining a field flattening intensity value for a plurality of pixels of said first raw image, and d) generating a third field flattened image of said probe on said solid support using said field flattening intensity of said plurality of pixels, wherein the correlation of intensity of a plurality of pixels to their spatial location on said third field flattened image is reduced compared to the intensity of a plurality of pixels at a corresponding location on said first raw image.
Specification