System, method, and computer software product for genotype determination using probe array data
First Claim
1. A method for genotyping a plurality of single nucleotide polymorphisms (SNPs) in a nucleic acid sample using seed genotype cluster estimates derived without requiring mismatch probe data, the method comprising:
- hybridizing a nucleic acid sample with a plurality of allele-specific perfect-match probes provided in an array of perfect-match probes for a plurality of target sequences which the array is designed to genotype, wherein, for substantially all of the plurality of target sequences, the array is without corresponding mismatch probes;
acquiring intensity data associated with the hybridizing, wherein the intensity data comprises intensity values;
summarizing the intensity values to obtain a signal value for each allele for each of the plurality of SNPs;
transforming the signal values by discarding size information from the signal values, thereby generating transformed signal values represented in one-dimensional contrast space;
evaluating all plausible divisions of the transformed signal values into seed genotypes by applying a Gaussian likelihood model;
averaging the plausible divisions over most likely plausible divisions to derive a plurality of seed genotype clusters; and
genotyping the plurality of SNPs, wherein genotyping comprises a comparison of the transformed signal values with a set of typical values for each genotype, wherein the set of typical values comprises prior values, wherein the prior values further comprise estimates of genotype cluster center locations and genotype cluster center variances of the plurality of seed genotype clusters determined from the clustering properties of the transformed signal values;
wherein the steps of summarizing, transforming, evaluating, averaging, and genotyping are performed on a computer, and wherein the computer comprises a computer processor.
2 Assignments
0 Petitions
Accused Products
Abstract
An embodiment of a method of analyzing data from processed images of biological probe arrays is described that comprises receiving a plurality of files comprising a plurality of intensity values associated with a probe on a biological probe array; normalizing the intensity values in each of the data files; determining an initial assignment for a plurality of genotypes using one or more of the intensity values from each file for each assignment; estimating a distribution of cluster centers using the plurality of initial assignments; combining the normalized intensity values with the cluster centers to determine a posterior estimate for each cluster center; and assigning a plurality of genotype calls using a distance of the one or more intensity values from the posterior estimate.
-
Citations
24 Claims
-
1. A method for genotyping a plurality of single nucleotide polymorphisms (SNPs) in a nucleic acid sample using seed genotype cluster estimates derived without requiring mismatch probe data, the method comprising:
-
hybridizing a nucleic acid sample with a plurality of allele-specific perfect-match probes provided in an array of perfect-match probes for a plurality of target sequences which the array is designed to genotype, wherein, for substantially all of the plurality of target sequences, the array is without corresponding mismatch probes; acquiring intensity data associated with the hybridizing, wherein the intensity data comprises intensity values; summarizing the intensity values to obtain a signal value for each allele for each of the plurality of SNPs; transforming the signal values by discarding size information from the signal values, thereby generating transformed signal values represented in one-dimensional contrast space; evaluating all plausible divisions of the transformed signal values into seed genotypes by applying a Gaussian likelihood model; averaging the plausible divisions over most likely plausible divisions to derive a plurality of seed genotype clusters; and genotyping the plurality of SNPs, wherein genotyping comprises a comparison of the transformed signal values with a set of typical values for each genotype, wherein the set of typical values comprises prior values, wherein the prior values further comprise estimates of genotype cluster center locations and genotype cluster center variances of the plurality of seed genotype clusters determined from the clustering properties of the transformed signal values; wherein the steps of summarizing, transforming, evaluating, averaging, and genotyping are performed on a computer, and wherein the computer comprises a computer processor. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
Specification