Methods And Compositions For Base Calling Nucleic Acids

US 20100063743A1
Filed: 03/17/2009
Published: 03/11/2010
Est. Priority Date: 03/19/2008
Status: Active Grant

First Claim

Patent Images

1. A method for determining an identity of a nucleic acid at an interrogation position in a nucleotide sequence from data acquired from one or more channels, comprisinga) obtaining a data set for one or more probe intensities at one or more nucleic acid positions in said sequence, wherein each probe corresponds to a nucleic acid,b) determining the ratio contribution to probe intensity at said interrogation position from probe intensities at the interrogation position and at one or both ofi) at least one subsequent nucleic acid positions in said sequence, andii) at least one preceding nucleic acid positions in said sequence, andc) applying said ratio contribution to probe intensity to said data set to arrive at an identity for a nucleic acid at said interrogation position in said nucleotide sequence.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention provides methods and compositions, including, without limitation, algorithms, computer readable media, computer programs, apparatus, and systems for determining the identity of nucleic acids in nucleotide sequences using, for example, data obtained from sequencing by synthesis methods. The methods of the invention include correcting one or more phenomena that are encountered during nucleotide sequencing, such as using sequencing by synthesis methods. These phenomena include, without limitation, sequence lead, sequence lag, spectral crosstalk, and noise resulting from variations in illumination and/or filter responses.

Citations

52 Claims

1. A method for determining an identity of a nucleic acid at an interrogation position in a nucleotide sequence from data acquired from one or more channels, comprisinga) obtaining a data set for one or more probe intensities at one or more nucleic acid positions in said sequence, wherein each probe corresponds to a nucleic acid,b) determining the ratio contribution to probe intensity at said interrogation position from probe intensities at the interrogation position and at one or both ofi) at least one subsequent nucleic acid positions in said sequence, andii) at least one preceding nucleic acid positions in said sequence, andc) applying said ratio contribution to probe intensity to said data set to arrive at an identity for a nucleic acid at said interrogation position in said nucleotide sequence.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 34)
- - 2. The method of claim 1, wherein said determining the ratio contribution to probe intensity comprises measuring the rate at which a lag occurs at one or more nucleotide position in said nucleotide sequence.
  - 3. The method of claim 2, wherein said determining the ratio contribution to probe intensity comprises measuring the rate at which a lag occurs at each nucleotide position in said nucleotide sequence.
  - 4. The method of claim 1, wherein said determining the ratio contribution to probe intensity comprises measuring the rate at which a lead occurs at one or more nucleotide positions in said nucleotide sequence.
  - 5. The method of claim 1, further comprising calling a nucleic acid at said interrogation position in said nucleotide sequence.
  - 6. The method of claim 1, further comprising repeating steps b) and c) to arrive at an identity for a nucleic acid at more than one interrogation position in said nucleotide sequence.
  - 7. The method of claim 1, further comprisinga) applying a sequence lead-lag compensation equation to determine the ratio contribution to probe intensity from probe ati) said interrogation position,ii) each position preceding said interrogation position, andiii) each position subsequent to said interrogation position, andb) summing up said ratio contribution to probe intensity.
  - 8. The method of claim 1, further comprising applying a sequence lead-lag compensation equation to said ratio contribution to probe intensity at a plurality of positions in said sequence.
  - 9. The method of claim 8, wherein said sequence lead-lag compensation equation is determined by applying equation
  - 10. The method of claim 8, wherein said sequence lead-lag compensation equation is determined by applying equation
  - 11. The method of claim 10, wherein said sequence lead-lag compensation equation is determined by applying equation
  - 12. The method of claim 1, wherein the nucleic acid comprises a base selected from the group consisting of adenine (A), guanine (G), cytosine (C), and thymine (T).
  - 13. The method of claim 1, wherein said probe is fluorescent.
  - 14. The method of claim 1, further comprising field flattening of background data for said data set.
  - 15. The method of claim 14, wherein said field flattening comprisesa) obtaining a first data set for a plurality of pixel intensities of a first raw image of a probe at a first concentration on a solid support, wherein said first raw image is produced using a first spectral filter for detecting a first probe,b) obtaining a second data set for a plurality of pixel intensities of a second smoothed image of said probe on said solid support, wherein said second smoothed image is produced using a low pass filter,c) determining a field flattening intensity value for a plurality of pixels of said first raw image, andd) generating a third field flattened image of said probe on said solid support using said field flattening intensity of said plurality of pixels, wherein the correlation of intensity of a plurality of pixels to their spatial location on said third field flattened image is reduced compared to the intensity of a plurality of pixels at a corresponding location on said first raw image.
  - 16. The method of claim 15, wherein said field flattening intensity value of a pixel is determined by equation
    F_x,y=R_x,yM_x0,y0/M_x,ywhereF_x,yis a field flattening intensity value,R_x,yis the intensity of a pixel of the plurality of pixels on the first raw image,M_x,yis the intensity of a pixel of the plurality of pixels on the second smoothed image at a corresponding spatial location to the pixel on the first raw image, andM_x0,y0is the intensity of a reference pixel on said second smoothed image or is an arbitrary scale factor.
  - 17. The method of claim 15, further comprising repeating steps a) to d), using a second spectral filter for detecting a second probe.
  - 18. The method of claim 15, further comprising repeating steps a) to d), using said probe at a second concentration on said solid support.
  - 19. The method of claim 14, wherein said solid support comprises a microscope slide.
  - 20. The method of claim 14, wherein said solid support comprises a silicon chip.
  - 21. The method of claim 1, further comprising reducing spectral crosstalk in said one or more channels.
  - 22. The method of claim 21, wherein said reducing spectral crosstalk comprisesa) determining spectral crosstalk factors for each of said one or more probes in its corresponding channel from one or more adjacent channels,b) applying said spectral crosstalk factors to determine a spectral crosstalk matrix, andc) applying said spectral crosstalk matrix to said data set for said one or more probe intensities.
  - 23. The method of claim 21, wherein said reducing spectral crosstalk comprisesa) determining probe intensity for one or more probes from one or more channels, wherein each channel corresponds to a probe;
    - b) determining the ratios of said probe intensities in said one or more channels to arrive at a signature ratio for said probe intensity in its corresponding channel,c) applying said signature ratios in a matrix equation, andd) inverting said matrix equation to arrive at an inverted matrix.
  - 24. The method of claim 23, further comprising e) applying said inverted matrix to data from said one or more channels.
  - 25. The method of claim 22, wherein said determining spectral crosstalk matrix comprises using equation
  - 34. A computer readable medium containing a computer program for performing the method of claim 1.

26. The method of claim 26, wherein said equation is solved to determine spectral crosstalk matrix K⁻
- 1 using equation

27. An algorithm for processing data for nucleic acids in a nucleotide sequence, wherein said data is acquired from one or more channels, the algorithm comprisinga) determining the ratio contribution to probe intensity in said one or more channels for one or more interrogation positions, from probe intensities at the interrogation position and at one or both ofiv) at least one subsequent nucleic acid positions in said sequence, andv) at least one preceding nucleic acid positions in said sequence,b) processing data from said one or more channels to correct for sequence lead and sequence lag, andc) reconstructing said data in said one or more channels.
- View Dependent Claims (28, 29, 30, 31, 32, 33)
- - 28. The algorithm of claim 27, wherein said processing data comprises applying said ratio contribution to probe intensity to determine, for said probe at said one or more interrogation positions, a sequence lead-lag compensation equation.
  - 29. The algorithm of claim 28, wherein said sequence lead-lag compensation equation is determined by applying equation
  - 30. The algorithm of claim 27, wherein said sequence lead-lag compensation equation is determined by applying equation
  - 31. The algorithm of claim 30, wherein said sequence lead-lag compensation equation is determined by applying equation
  - 32. The algorithm of claim 27, further comprising field flattening of background data.
  - 33. The algorithm of claim 27, further comprising reducing spectral crosstalk between said data comprised in a plurality of channels.

35. A computer program product for processing data for nucleic acids in a nucleotide sequence to determine an identity of a nucleic acid at an interrogation position in said nucleotide sequence, said computer program product comprisinga) computer code that inputs data from one or more channels for one or more probe intensities, wherein each channel corresponds to a probe, and each probe corresponds to a nucleic acid,b) computer code that applies to the input data a sequence lead-lag compensation equation to correct for sequence lead and sequence lag,c) computer code that compares probe intensities in said one or more channels that have been corrected for sequence lead and sequence lag,d) computer code that determines the highest probe intensity of the compared probe intensities, ande) computer code that identifies a nucleic acid at said interrogation position according to said highest probe intensity.
- View Dependent Claims (36, 37)
- - 36. The computer program product of claim 35, further comprising f) computer code that applies field flattening of background data.
  - 37. The computer program product of claim 35, further comprising f) computer code that reduces spectral crosstalk between data comprised in said one or more channels.

38. An apparatus that processes data for nucleic acids in a nucleotide sequence to determine an identity of a nucleic acid at an interrogation position in said nucleotide sequence, said apparatus comprisinga) means for inputting data from one or more channels for one or more probe intensities, wherein each channel corresponds to a probe, and each probe corresponds to a nucleic acid,b) means for applying to the input data a sequence lead-lag compensation equation to correct for sequence lead and sequence lag,c) means for comparing probe intensities in said one or more channels that have been corrected for sequence lead and sequence lag,d) means for determining the highest probe intensity of the compared probe intensities, ande) means for identifying a nucleic acid at said interrogation position according to said highest probe intensity.
- View Dependent Claims (39, 40)
- - 39. The apparatus of claim 38, further comprising f) means for applying field flattening of background data.
  - 40. The apparatus of claim 38, further comprising f) means for reducing spectral crosstalk between data comprised in said one or more channels.

41. A system for processing data to determine an identity of a nucleic acid at an interrogation position in said nucleotide sequence, said system comprisinga) a processor, andb) a computer readable medium readable by said processor, said computer readable medium storing a computer program that comprisescode that receives as input a plurality of nucleic acid base calls at an interrogation position in a nucleotide sequence,code that applies to the input data a sequence lead-lag compensation equation to correct for sequence lead and sequence lag, andcode that identifies a nucleic acid at said interrogation position according to the corrected data.
- View Dependent Claims (42, 43)
- - 42. The system of claim 41, wherein said computer readable medium further comprises 4) code that applies field flattening of background data.
  - 43. The system of claim 41, said computer readable medium further comprises 4) code that reduces spectral crosstalk between data comprised in said one or more channels.

44. A method for field flattening an image of a probe on a solid support, comprisinga) obtaining a first data set for a plurality of pixel intensities of a first raw image of a probe at a first concentration on a solid support, wherein said first raw image is produced using a first spectral filter for detecting a first probe,b) obtaining a second data set for a plurality of pixel intensities of a second smoothed image of said probe on said solid support, wherein said second smoothed image is produced using a low-pass filter,c) determining a field flattening intensity value for a plurality of pixels of said first raw image, andd) generating a third field flattened image of said probe on said solid support using said field flattening intensity of said plurality of pixels, wherein the correlation of intensity of a plurality of pixels to their spatial location on said third field flattened image is reduced compared to the intensity of a plurality of pixels at a corresponding location on said first raw image.
- View Dependent Claims (45, 46, 47, 48, 49, 50, 51, 52)
- - 45. The method of claim 44, wherein said field flattening intensity value of a pixel is determined by equation
    F_x,y=R_x,yM_x0,y0/M_x,ywhereF_x,yis a field flattening intensity value,R_x,yis the intensity of a pixel of the plurality of pixels on the first raw image,M_x,yis the intensity of a pixel of the plurality of pixels on the second smoothed image at a corresponding spatial location to the pixel on the first raw image, andM_x0,y0is the intensity of a reference pixel on said second smoothed image or is any other scale factor of interest.
  - 46. The method of claim 44, further comprising repeating steps a) to d), using a second spectral filter for detecting a second probe.
  - 47. The method of claim 44, further comprising repeating steps a) to d), using said probe at a second concentration on said solid support.
  - 48. The method of claim 44, wherein said probe is fluorescent.
  - 49. The method of claim 48, wherein said probe corresponds to a nucleic acid.
  - 50. The method of claim 49, wherein said nucleic acid comprises a base selected from the group consisting of adenine (A), guanine (G), cytosine (C), and thymine (T).
  - 51. The method of claim 44, wherein said solid support comprises a microscope slide.
  - 52. The method of claim 44, wherein said solid support comprises a silicon chip.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
QIAGEN Sciences, LLC (Qiagen NV)
Original Assignee
Intelligent Bio-Systems Incorporated. (Qiagen NV)
Inventors
GORDON, STEVEN, VEATCH, PHILLIP A.

Granted Patent

US 8,612,161 B2
Time in Patent Office

Days
Field of Search
US Class Current

702/19
CPC Class Codes

C12Q 1/6806   Preparing nucleic acids for...

C12Q 1/6825   Nucleic acid detection invo...

C12Q 1/6869   Methods for sequencing

C12Q 2525/117   incorporating modified base

C12Q 2525/186   incorporating a non-extenda...

C12Q 2527/137   Concentration of a componen...

C12Q 2537/165   Mathematical modelling, e.g...

C12Q 2563/107   fluorescence

C12Q 2565/601   being a microscope, e.g. at...

G16B 25/00   ICT specially adapted for h...

Methods And Compositions For Base Calling Nucleic Acids

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

52 Claims

Specification

Solutions

Use Cases

Quick Links

Methods And Compositions For Base Calling Nucleic Acids

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

52 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links