Methods for identifying DNA copy number changes
First Claim
1. A method of estimating in a sample the copy number of a plurality of genomic regions in a genome, wherein each genomic region contains at least one single nucleotide polymorphisms (SNP) from a plurality of SNPs, wherein each SNP in the plurality has an A and a B allele in a population, said method comprising:
- (a) genotyping the sample using a high density genotyping array comprising a plurality of perfect match and mismatch-probes for the A allele of each SNP in the plurality of SNPs (PMA and MMA) and a plurality of perfect match and mismatch probes for the B allele (PMB and MMB) to obtain a raw intensity measurement for each PMA, MMA, PMB and MMB probe for each SNP in the plurality of SNPs, wherein said and to obtain a genotyping call for each SNP in the plurality of SNPs;
(b) transforming each raw intensity measurement to its natural log to obtain a transformed intensity value for each probe;
(c) normalizing the transformed intensity values using the MMB transformed intensity values for all SNPs from the plurality of SNPs that are called BB in the sample to obtain normalized PMA intensities;
(d) normalizing each PMB transformed intensity values using the MMA transformed intensities for all SNPs from the plurality that are called AA in the sample to obtain normalized PMB intensities;
(e) using a plurality of reference samples, identify a set of PMA probes and a set of PMB probes for each SNP in the plurality of SNPs that show linear correlation between copy number and intensity;
(f) calculating for each SNP in the plurality of SNPs an average of the PMA probes in the set of PMA probes and an average of the PMB probes in the set of PMB probes to obtain a PMA average intensity and a PMB average intensity for each SNP in the plurality of SNPs;
(g) performing linear regression against a model equation derived from a plurality of reference samples to obtain an estimated A allele copy number and an estimated B allele copy number for each SNP in the plurality of SNPs;
(h) adding the estimated A allele copy number to the estimated B allele copy number to obtain an estimated total copy number of the genomic region of each SNP in the plurality of SNPs, thereby calculating an estimated total copy number for each of a plurality of genomic regions in a genome; and
(i) applying regression tree analysis to the estimated total copy numbers obtained in (h) to partition the genome into genomic regions having the same estimated total copy number, wherein steps (b)-(i) are performed by a computer and wherein the computer outputs the estimated total copy number of a plurality of genomic regions in a computer readable format.
5 Assignments
0 Petitions
Accused Products
Abstract
Methods of identifying allele-specific changes in genomic DNA copy number are disclosed. Methods for identifying homozygous deletions and genetic amplifications are disclosed. An array of probes designed to detect presence or absence of a plurality of different sequences is also disclosed. The probes are designed to hybridize to sequences that are predicted to be present in a reduced complexity sample. The methods may be used to detect copy number changes in cancerous tissue compared to normal tissue. The methods may be used to diagnose cancer and other diseases associated with chromosomal anomalies.
173 Citations
8 Claims
-
1. A method of estimating in a sample the copy number of a plurality of genomic regions in a genome, wherein each genomic region contains at least one single nucleotide polymorphisms (SNP) from a plurality of SNPs, wherein each SNP in the plurality has an A and a B allele in a population, said method comprising:
-
(a) genotyping the sample using a high density genotyping array comprising a plurality of perfect match and mismatch-probes for the A allele of each SNP in the plurality of SNPs (PMA and MMA) and a plurality of perfect match and mismatch probes for the B allele (PMB and MMB) to obtain a raw intensity measurement for each PMA, MMA, PMB and MMB probe for each SNP in the plurality of SNPs, wherein said and to obtain a genotyping call for each SNP in the plurality of SNPs; (b) transforming each raw intensity measurement to its natural log to obtain a transformed intensity value for each probe; (c) normalizing the transformed intensity values using the MMB transformed intensity values for all SNPs from the plurality of SNPs that are called BB in the sample to obtain normalized PMA intensities; (d) normalizing each PMB transformed intensity values using the MMA transformed intensities for all SNPs from the plurality that are called AA in the sample to obtain normalized PMB intensities; (e) using a plurality of reference samples, identify a set of PMA probes and a set of PMB probes for each SNP in the plurality of SNPs that show linear correlation between copy number and intensity; (f) calculating for each SNP in the plurality of SNPs an average of the PMA probes in the set of PMA probes and an average of the PMB probes in the set of PMB probes to obtain a PMA average intensity and a PMB average intensity for each SNP in the plurality of SNPs; (g) performing linear regression against a model equation derived from a plurality of reference samples to obtain an estimated A allele copy number and an estimated B allele copy number for each SNP in the plurality of SNPs; (h) adding the estimated A allele copy number to the estimated B allele copy number to obtain an estimated total copy number of the genomic region of each SNP in the plurality of SNPs, thereby calculating an estimated total copy number for each of a plurality of genomic regions in a genome; and (i) applying regression tree analysis to the estimated total copy numbers obtained in (h) to partition the genome into genomic regions having the same estimated total copy number, wherein steps (b)-(i) are performed by a computer and wherein the computer outputs the estimated total copy number of a plurality of genomic regions in a computer readable format. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for estimating the copy number of a genomic region in an experimental sample comprising:
-
(a) isolating nucleic acid from the experimental sample; (b) fragmenting the nucleic acid sample with a restriction enzyme; (c) ligating an adaptor to the fragments (d) amplifying at least some of the adaptor ligated fragments (e) labeling the amplified products; (f) hybridizing the labeled amplified products to an array to obtain a hybridization pattern, wherein the array comprises a plurality of genotyping probe sets for a plurality of SNPs, wherein a probe set comprises; (i) a plurality of perfect match probes to a first allele of a SNP, (ii) a plurality of perfect match probes to a second allele of the SNP, (iii) a plurality of mismatch probes to the first allele of the SNP, and (iv) a plurality of mismatch probes to the second allele of the SNP, (g) obtaining a raw intensity measurement for each perfect match and each mismatch probe in each probe set for each SNP; (h) calculating the natural log(ln) of the raw intensity measurement for each probe; (i) standardizing the natural log of the raw intensity measurement for each probe using as background the mismatch probe intensities from the opposite allele in SNPs with a genotype call homozygous for the opposite allele; (j) obtaining a standardized measurement for each perfect match probe by a method comprising obtaining a first background intensity by calculating an average of a plurality of B allele mismatch probes for a plurality of SNPs called homozygous A in the sample, obtaining a second background intensity by calculating an average of a plurality of A allele mismatch probes for a plurality of SNPs called homozygous B in the sample, (k) standardize the PMa probes so that the MMa probes for SNPs with BB genotype calls have a variance of one and a mean of zero; (l) standardize the PMb probes so that the mismatch B probes for homozygous A SNPs have a variance of one and a mean of zero; (m) select probes to be included in calculation by identifying probes that show a linear response between copy number and intensity above a threshold and calculate an average intensity across selected probes in a probe set; (n) perform regression analysis on the reference set mean intensities for a given probe set and genotype for each SNP; (o) compare the intensity of the target sample against the mean intensity values of samples from the reference set with the same genotype call; (p) apply a linear regression to adjust the target intensity so that it falls on the line Y+X, perform separately on PMA and PMB probe intensities; (q) model copy number from the reference samples using the following equations
ln(Ca,rm+δ
a,m)=γ
a1,mIa,rm+ε
a,rm
ln(Cb,rm+δ
b,m)=γ
b0,m+γ
b1,mIb,rm+ε
b,rm(r) use the values of γ
ao,m, γ
a1,m, γ
bo,m, γ
b1,m, obtained in (q) to estimate copy number of the unknown sample for each allele of each SNP using the following equation
Ĉ
a,lm=max(exp({circumflex over (γ
)}a0,m+{circumflex over (γ
)}a1,mIa,lm)−
{circumflex over (δ
)}a,lm,0)
and
Ĉ
b,lm=max(exp({circumflex over (γ
)}b0,m+{circumflex over (γ
)}b1,mIb,lm)−
{circumflex over (δ
)}b,lm,0)(s) perform kernel smoothing on the estimated copy number applying significance with a 1 Mb window and Gaussian kernel.
-
Specification