Methods and compositions for base calling nucleic acids
First Claim
1. A method for correcting miscalls of nucleotides incorporated during DNA sequencing, comprisinga) sequencing a nucleic acid sequence by DNA sequencing by synthesis, wherein said sequencing comprises using a polymerase to incorporate four different nucleotides each nucleotide containing a probe associated with a color, wherein said DNA sequencing is associated with miscalls comprising dephasing and color crosstalk,b) field flattening an image of the probe comprising the steps ofusing a first spectral filter for detecting a first probe at a first concentration on a solid support to produce a first raw image of said probe,obtaining a first data set for a plurality of pixel intensities of said first raw image,using a low-pass filter to produce a second smoothed image of said probe on said solid support,obtaining a second data set for a plurality of pixel intensities of said second smoothed image,determining a field flattening intensity value for a plurality of pixels of said first raw image, wherein said determining comprises using the equation
Fx,y=Rx,yMx0,y0/Mx,y whereFx,y is a field flattening intensity value,Rx,y is the intensity of a pixel of the plurality of pixels on the first raw image,Mx,y is the intensity of a pixel of the plurality of pixels on the second smoothed image at a corresponding spatial location to the pixel on the first raw image, andMx0,y0 is the intensity of a reference pixel on said second smoothed image or is any other scale factor of interest, andgenerating a third field flattened image of said probe on said solid support using said field flattening intensity of said plurality of pixels, wherein the correlation of intensity of a plurality of pixels to their spatial location on said third field flattened image is reduced compared to the intensity of a plurality of pixels at a corresponding location on said first raw image,c) correcting said color crosstalk between the four probes after step b),d) correcting said dephasing after step c),e) correcting said miscalls of said four different nucleotides incorporated in step a) using the corrected color crosstalk of step c) and the corrected dephasing of step d), andf) base calling said four different nucleotides of step a) using the corrected miscalls of step e).
4 Assignments
0 Petitions
Accused Products
Abstract
The invention provides methods and compositions, including, without limitation, algorithms, computer readable media, computer programs, apparatus, and systems for determining the identity of nucleic acids in nucleotide sequences using, for example, data obtained from sequencing by synthesis methods. The methods of the invention include correcting one or more phenomena that are encountered during nucleotide sequencing, such as using sequencing by synthesis methods. These phenomena include, without limitation, sequence lead, sequence lag, spectral crosstalk, and noise resulting from variations in illumination and/or filter responses.
14 Citations
6 Claims
-
1. A method for correcting miscalls of nucleotides incorporated during DNA sequencing, comprising
a) sequencing a nucleic acid sequence by DNA sequencing by synthesis, wherein said sequencing comprises using a polymerase to incorporate four different nucleotides each nucleotide containing a probe associated with a color, wherein said DNA sequencing is associated with miscalls comprising dephasing and color crosstalk, b) field flattening an image of the probe comprising the steps ofusing a first spectral filter for detecting a first probe at a first concentration on a solid support to produce a first raw image of said probe, obtaining a first data set for a plurality of pixel intensities of said first raw image,using a low-pass filter to produce a second smoothed image of said probe on said solid support,obtaining a second data set for a plurality of pixel intensities of said second smoothed image,determining a field flattening intensity value for a plurality of pixels of said first raw image, wherein said determining comprises using the equation
Fx,y=Rx,yMx0,y0/Mx,ywhere Fx,y is a field flattening intensity value, Rx,y is the intensity of a pixel of the plurality of pixels on the first raw image, Mx,y is the intensity of a pixel of the plurality of pixels on the second smoothed image at a corresponding spatial location to the pixel on the first raw image, and Mx0,y0 is the intensity of a reference pixel on said second smoothed image or is any other scale factor of interest, and generating a third field flattened image of said probe on said solid support using said field flattening intensity of said plurality of pixels, wherein the correlation of intensity of a plurality of pixels to their spatial location on said third field flattened image is reduced compared to the intensity of a plurality of pixels at a corresponding location on said first raw image,c) correcting said color crosstalk between the four probes after step b), d) correcting said dephasing after step c), e) correcting said miscalls of said four different nucleotides incorporated in step a) using the corrected color crosstalk of step c) and the corrected dephasing of step d), and f) base calling said four different nucleotides of step a) using the corrected miscalls of step e).
Specification