Method and device for identifying a biological sample
First Claim
1. An automated method for identifying a component in a DNA sample, comprising:
- using a mass spectrometer to generate a computer readable data set comprising data representing components in the biological sample for analysis by a computer, and using the computer to;
denoise the data set to generate denoised data;
correct a baseline from the denoised data to generate an intermediate data set, the intermediate data set having a plurality of data values associated with respective points in an array of data;
compress the intermediate data set to obtain compressed data;
define putative peaks in the compressed data, wherein the putative peaks represent components in the DNA sample;
generate a residual baseline by removing the putative peaks from the compressed data;
remove the residual baseline from the compressed data to generate a corrected data set;
locate a putative peak in the corrected data set; and
identify the component that corresponds to the located putative peak;
wherein the compressed data comprises compressed data points and wherein a compressed data point is a real number that includes a whole number portion that is determined by calculating the difference between the whole number portions of two consecutive points in the array of data.
8 Assignments
0 Petitions
Accused Products
Abstract
The method and system for identifying a biological sample generates a data set indicative of the composition of the biological sample. In a particular example, the data set is DNA spectrometry data received from a mass spectrometer. The data set is denoised, and a baseline is deleted. Since possible compositions of the biological sample may be known, expected peak areas may be determined. Using the expected peak areas, a residual baseline is generated to further correct the data set. Probable peaks are then identifiable in the corrected data set, which are used to identify the composition of the biological sample. In a disclosed example, statistical methods are employed to determine the probability that a probable peak is an actual peak, not an actual peak, or that the data are too inconclusive to call.
-
Citations
9 Claims
-
1. An automated method for identifying a component in a DNA sample, comprising:
-
using a mass spectrometer to generate a computer readable data set comprising data representing components in the biological sample for analysis by a computer, and using the computer to; denoise the data set to generate denoised data; correct a baseline from the denoised data to generate an intermediate data set, the intermediate data set having a plurality of data values associated with respective points in an array of data; compress the intermediate data set to obtain compressed data; define putative peaks in the compressed data, wherein the putative peaks represent components in the DNA sample; generate a residual baseline by removing the putative peaks from the compressed data; remove the residual baseline from the compressed data to generate a corrected data set; locate a putative peak in the corrected data set; and identify the component that corresponds to the located putative peak; wherein the compressed data comprises compressed data points and wherein a compressed data point is a real number that includes a whole number portion that is determined by calculating the difference between the whole number portions of two consecutive points in the array of data. - View Dependent Claims (5, 6, 7, 8, 9)
-
-
2. An automated method for identifying a component in a DNA sample, comprising:
-
using a mass spectrometer to generate a computer readable data set comprising data representing components in the biological sample for analysis by a computer, and using the computer to; denoise the data set to generate denoised data; correct a baseline from the denoised data to generate an intermediate data set, the intermediate data set having a plurality of data values associated with respective points in an array of data; compress the intermediate data set to obtain compressed data; define putative peaks in the compressed data, wherein the putative peaks represent components in the DNA sample; generate a residual baseline by removing the putative peaks from the compressed data; remove the residual baseline from the compressed data to generate a corrected data set; locate a putative peak in the corrected data set; and identify the component that corresponds to the located putative peak; wherein the compressed data comprises compressed data points and wherein a compressed data point is a real number that includes a decimal portion representing the difference between a maximum value of all the data values and a value at a particular point in the array.
-
-
3. An automated method for identifying a component in a DNA sample, comprising:
using a mass spectrometer to generate a computer readable data set comprising data representing components in the biological sample for analysis by a computer, and using the computer to; denoise the data set to generate denoised data; correct a baseline from the denoised data to generate an intermediate data set; define putative peaks in the intermediate data set, wherein the putative peaks represent components in the DNA sample; generate a residual baseline by removing the putative peaks from the intermediate data set, comprising the steps of a) identifying the center line of each putative peak; b) removing an area to the right of the center line of each putative peak; and c) removing an area equal to twice the width of the Gaussian curve fit to each putative peak from the left of the center line of each putative peak; remove the residual baseline from the intermediate data set to generate a corrected data set; locate a putative peak in the corrected data set; and identify the component that corresponds to the located putative peak.
-
4. An automated method for identifying a component in a DNA sample, comprising:
-
using a mass spectrometer to generate a computer readable data set comprising data representing components in the biological sample for analysis by a computer, and using the computer to; denoise the data set to generate denoised data; correct a baseline from the denoised data to generate an intermediate data set; define putative peaks in the intermediate data set, wherein the putative peaks represent components in the DNA sample; generate a residual baseline by removing the putative peaks from the intermediate data set, comprising the steps of a) identifying the center line of each putative peak; b) removing an area equal to the area corresponding to 50 Daltons along the x-axis to the right of the center line of each putative peak; and c) removing an area to the left of the center line of each putative peak; remove the residual baseline from the intermediate data set to generate a corrected data set; locate a putative peak in the corrected data set; and identify the component that corresponds to the located putative peak.
-
Specification