Method and device for identifying a biological sample

US 7,917,301 B1
Filed: 09/19/2000
Issued: 03/29/2011
Est. Priority Date: 09/19/2000
Status: Expired due to Fees

First Claim

Patent Images

1. An automated method for identifying a component in a DNA sample, comprising:

using a mass spectrometer to generate a computer readable data set comprising data representing components in the biological sample for analysis by a computer, and using the computer to;

denoise the data set to generate denoised data;

correct a baseline from the denoised data to generate an intermediate data set, the intermediate data set having a plurality of data values associated with respective points in an array of data;

compress the intermediate data set to obtain compressed data;

define putative peaks in the compressed data, wherein the putative peaks represent components in the DNA sample;

generate a residual baseline by removing the putative peaks from the compressed data;

remove the residual baseline from the compressed data to generate a corrected data set;

locate a putative peak in the corrected data set; and

identify the component that corresponds to the located putative peak;

wherein the compressed data comprises compressed data points and wherein a compressed data point is a real number that includes a whole number portion that is determined by calculating the difference between the whole number portions of two consecutive points in the array of data.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The method and system for identifying a biological sample generates a data set indicative of the composition of the biological sample. In a particular example, the data set is DNA spectrometry data received from a mass spectrometer. The data set is denoised, and a baseline is deleted. Since possible compositions of the biological sample may be known, expected peak areas may be determined. Using the expected peak areas, a residual baseline is generated to further correct the data set. Probable peaks are then identifiable in the corrected data set, which are used to identify the composition of the biological sample. In a disclosed example, statistical methods are employed to determine the probability that a probable peak is an actual peak, not an actual peak, or that the data are too inconclusive to call.

Citations

9 Claims

1. An automated method for identifying a component in a DNA sample, comprising:
- using a mass spectrometer to generate a computer readable data set comprising data representing components in the biological sample for analysis by a computer, and using the computer to;
  
  denoise the data set to generate denoised data;
  
  correct a baseline from the denoised data to generate an intermediate data set, the intermediate data set having a plurality of data values associated with respective points in an array of data;
  
  compress the intermediate data set to obtain compressed data;
  
  define putative peaks in the compressed data, wherein the putative peaks represent components in the DNA sample;
  
  generate a residual baseline by removing the putative peaks from the compressed data;
  
  remove the residual baseline from the compressed data to generate a corrected data set;
  
  locate a putative peak in the corrected data set; and
  
  identify the component that corresponds to the located putative peak;
  
  wherein the compressed data comprises compressed data points and wherein a compressed data point is a real number that includes a whole number portion that is determined by calculating the difference between the whole number portions of two consecutive points in the array of data.
- View Dependent Claims (5, 6, 7, 8, 9)
- - 5. The method of claim 1, 2, 3, or 4 further comprising:
    - determining a peak probability for the putative peak; and
      
      multiplying the peak probability by an allelic penalty to obtain a final peak probability.
  - 6. The method of claim 1, 2, 3, or 4 further comprising:
    - calculating a peak probability that a putative peak in the corrected data is a peak indicating composition of the DNA sample;
      
      calculating a peak probability for each of a plurality of putative peaks in the corrected data; and
      
      comparing the highest peak probability is to a second-highest peak probability to generate a calling ratio.
  - 7. The method according to claim 6 wherein the calling ratio is used to determine if the composition of the DNA sample will be called.
  - 8. The method according to claim 5, wherein the peak probability is determined from a probability profile.
  - 9. The method according to claim 5, comprising determining an allelic ratio, wherein the allelic ratio is a comparison of two peak heights in the corrected data, and assigning the allelic penalty to the allelic ratio.

2. An automated method for identifying a component in a DNA sample, comprising:
- using a mass spectrometer to generate a computer readable data set comprising data representing components in the biological sample for analysis by a computer, and using the computer to;
  
  denoise the data set to generate denoised data;
  
  correct a baseline from the denoised data to generate an intermediate data set, the intermediate data set having a plurality of data values associated with respective points in an array of data;
  
  compress the intermediate data set to obtain compressed data;
  
  define putative peaks in the compressed data, wherein the putative peaks represent components in the DNA sample;
  
  generate a residual baseline by removing the putative peaks from the compressed data;
  
  remove the residual baseline from the compressed data to generate a corrected data set;
  
  locate a putative peak in the corrected data set; and
  
  identify the component that corresponds to the located putative peak;
  
  wherein the compressed data comprises compressed data points and wherein a compressed data point is a real number that includes a decimal portion representing the difference between a maximum value of all the data values and a value at a particular point in the array.

3. An automated method for identifying a component in a DNA sample, comprising:
- using a mass spectrometer to generate a computer readable data set comprising data representing components in the biological sample for analysis by a computer, and using the computer to;
  
  denoise the data set to generate denoised data;
  
  correct a baseline from the denoised data to generate an intermediate data set;
  
  define putative peaks in the intermediate data set, wherein the putative peaks represent components in the DNA sample;
  
  generate a residual baseline by removing the putative peaks from the intermediate data set, comprising the steps ofa) identifying the center line of each putative peak;
  
  b) removing an area to the right of the center line of each putative peak; and
  
  c) removing an area equal to twice the width of the Gaussian curve fit to each putative peak from the left of the center line of each putative peak;
  
  remove the residual baseline from the intermediate data set to generate a corrected data set;
  
  locate a putative peak in the corrected data set; and
  
  identify the component that corresponds to the located putative peak.

4. An automated method for identifying a component in a DNA sample, comprising:
- using a mass spectrometer to generate a computer readable data set comprising data representing components in the biological sample for analysis by a computer, and using the computer to;
  
  denoise the data set to generate denoised data;
  
  correct a baseline from the denoised data to generate an intermediate data set;
  
  define putative peaks in the intermediate data set, wherein the putative peaks represent components in the DNA sample;
  
  generate a residual baseline by removing the putative peaks from the intermediate data set, comprising the steps ofa) identifying the center line of each putative peak;
  
  b) removing an area equal to the area corresponding to 50 Daltons along the x-axis to the right of the center line of each putative peak; and
  
  c) removing an area to the left of the center line of each putative peak;
  
  remove the residual baseline from the intermediate data set to generate a corrected data set;
  
  locate a putative peak in the corrected data set; and
  
  identify the component that corresponds to the located putative peak.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Agena Bioscience Inc. (Mesa Laboratories Inc.)
Original Assignee
Sequenom Incorporated (Laboratory Corporation Of America Holdings)
Inventors
Yip, Ping
Primary Examiner(s)
Skowronek; Karlheinz R

Application Number

US09/663,968
Time in Patent Office

3,843 Days
Field of Search

702/19, 435/5, 435/6
US Class Current

702/20
CPC Class Codes

G06F 17/00   Digital computing or data p...

G16Z 99/00   Subject matter not provided...

H01J 49/0036   Step by step routines descr...

Method and device for identifying a biological sample

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

Citations

9 Claims

Specification

Solutions

Use Cases

Quick Links

Method and device for identifying a biological sample

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

9 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links