System, method, and computer software product for genotype determination using probe array data
First Claim
1. A method for determining the genotype of a plurality of nucleic acid samples at a plurality of SNPs, comprising:
- (a) hybridizing each nucleic acid sample in said plurality of nucleic acid samples to an array of allele specific probes to obtain a plurality of raw probe intensity measurements;
(b) normalizing said raw probe intensity measurements to obtain normalized probe intensities;
(c) summarizing the normalized probe intensities to obtain an allele signal estimate for each allele of each SNP in each nucleic acid sample, wherein the allele signal estimate for each allele of each SNP in each nucleic acid sample comprises a first value SA and a second value SB where SA is the summary value for a first allele of the SNP and SB is the summary value for a second allele of the SNP;
(d) transforming said allele signal estimates in clustering space in one-dimension to obtain a transformed allele signal estimate for each SNP in each nucleic acid sample using the following equation;
transformed allele signal estimate =asinh(K(SA−
SB)/(SA+SB))/asinh(K), where K is a tuning constant;
(e) obtaining a prior distribution of genotype cluster characteristics;
(f) evaluating all possible assignments of the transformed allele signal estimates to the genotype clusters of the prior distribution;
(g) calculating an optimal assignment from the possible assignments of step (f) for each transformed allele signal estimate, to one or more genotype clusters using a Gaussian cluster model;
(h) updating the prior distribution with optimal assignments calculated in (g) to obtain a posterior distribution of genotype cluster characteristics for each SNP; and
(i) using the posterior distribution of genotype cluster characteristics for each SNP to make genotype calls for that SNP in each nucleic acid sample.
3 Assignments
0 Petitions
Accused Products
Abstract
An embodiment of a method of analyzing data from processed images of biological probe arrays is described that comprises receiving a plurality of files comprising a plurality of intensity values associated with a probe on a biological probe array; normalizing the intensity values in each of the data files; determining an initial assignment for a plurality of genotypes using one or more of the intensity values from each file for each assignment; estimating a distribution of cluster centers using the plurality of initial assignments; combining the normalized intensity values with the cluster centers to determine a posterior estimate for each cluster center; and assigning a plurality of genotype calls using a distance of the one or more intensity values from the posterior estimate.
-
Citations
12 Claims
-
1. A method for determining the genotype of a plurality of nucleic acid samples at a plurality of SNPs, comprising:
-
(a) hybridizing each nucleic acid sample in said plurality of nucleic acid samples to an array of allele specific probes to obtain a plurality of raw probe intensity measurements; (b) normalizing said raw probe intensity measurements to obtain normalized probe intensities; (c) summarizing the normalized probe intensities to obtain an allele signal estimate for each allele of each SNP in each nucleic acid sample, wherein the allele signal estimate for each allele of each SNP in each nucleic acid sample comprises a first value SA and a second value SB where SA is the summary value for a first allele of the SNP and SB is the summary value for a second allele of the SNP; (d) transforming said allele signal estimates in clustering space in one-dimension to obtain a transformed allele signal estimate for each SNP in each nucleic acid sample using the following equation;
transformed allele signal estimate =asinh(K(SA−
SB)/(SA+SB))/asinh(K), where K is a tuning constant;(e) obtaining a prior distribution of genotype cluster characteristics; (f) evaluating all possible assignments of the transformed allele signal estimates to the genotype clusters of the prior distribution; (g) calculating an optimal assignment from the possible assignments of step (f) for each transformed allele signal estimate, to one or more genotype clusters using a Gaussian cluster model; (h) updating the prior distribution with optimal assignments calculated in (g) to obtain a posterior distribution of genotype cluster characteristics for each SNP; and (i) using the posterior distribution of genotype cluster characteristics for each SNP to make genotype calls for that SNP in each nucleic acid sample. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
Specification