Methods for identifying DNA copy number changes

US 7,822,555 B2
Filed: 12/05/2005
Issued: 10/26/2010
Est. Priority Date: 11/11/2002
Status: Active Grant

First Claim

Patent Images

1. A method of estimating in a sample the copy number of a plurality of genomic regions in a genome, wherein each genomic region contains at least one single nucleotide polymorphisms (SNP) from a plurality of SNPs, wherein each SNP in the plurality has an A and a B allele in a population, said method comprising:

(a) genotyping the sample using a high density genotyping array comprising a plurality of perfect match and mismatch-probes for the A allele of each SNP in the plurality of SNPs (PMA and MMA) and a plurality of perfect match and mismatch probes for the B allele (PMB and MMB) to obtain a raw intensity measurement for each PMA, MMA, PMB and MMB probe for each SNP in the plurality of SNPs, wherein said and to obtain a genotyping call for each SNP in the plurality of SNPs;

(b) transforming each raw intensity measurement to its natural log to obtain a transformed intensity value for each probe;

(c) normalizing the transformed intensity values using the MMB transformed intensity values for all SNPs from the plurality of SNPs that are called BB in the sample to obtain normalized PMA intensities;

(d) normalizing each PMB transformed intensity values using the MMA transformed intensities for all SNPs from the plurality that are called AA in the sample to obtain normalized PMB intensities;

(e) using a plurality of reference samples, identify a set of PMA probes and a set of PMB probes for each SNP in the plurality of SNPs that show linear correlation between copy number and intensity;

(f) calculating for each SNP in the plurality of SNPs an average of the PMA probes in the set of PMA probes and an average of the PMB probes in the set of PMB probes to obtain a PMA average intensity and a PMB average intensity for each SNP in the plurality of SNPs;

(g) performing linear regression against a model equation derived from a plurality of reference samples to obtain an estimated A allele copy number and an estimated B allele copy number for each SNP in the plurality of SNPs;

(h) adding the estimated A allele copy number to the estimated B allele copy number to obtain an estimated total copy number of the genomic region of each SNP in the plurality of SNPs, thereby calculating an estimated total copy number for each of a plurality of genomic regions in a genome; and

(i) applying regression tree analysis to the estimated total copy numbers obtained in (h) to partition the genome into genomic regions having the same estimated total copy number, wherein steps (b)-(i) are performed by a computer and wherein the computer outputs the estimated total copy number of a plurality of genomic regions in a computer readable format.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods of identifying allele-specific changes in genomic DNA copy number are disclosed. Methods for identifying homozygous deletions and genetic amplifications are disclosed. An array of probes designed to detect presence or absence of a plurality of different sequences is also disclosed. The probes are designed to hybridize to sequences that are predicted to be present in a reduced complexity sample. The methods may be used to detect copy number changes in cancerous tissue compared to normal tissue. The methods may be used to diagnose cancer and other diseases associated with chromosomal anomalies.

173 Citations

8 Claims

1. A method of estimating in a sample the copy number of a plurality of genomic regions in a genome, wherein each genomic region contains at least one single nucleotide polymorphisms (SNP) from a plurality of SNPs, wherein each SNP in the plurality has an A and a B allele in a population, said method comprising:
- (a) genotyping the sample using a high density genotyping array comprising a plurality of perfect match and mismatch-probes for the A allele of each SNP in the plurality of SNPs (PMA and MMA) and a plurality of perfect match and mismatch probes for the B allele (PMB and MMB) to obtain a raw intensity measurement for each PMA, MMA, PMB and MMB probe for each SNP in the plurality of SNPs, wherein said and to obtain a genotyping call for each SNP in the plurality of SNPs;
  
  (b) transforming each raw intensity measurement to its natural log to obtain a transformed intensity value for each probe;
  
  (c) normalizing the transformed intensity values using the MMB transformed intensity values for all SNPs from the plurality of SNPs that are called BB in the sample to obtain normalized PMA intensities;
  
  (d) normalizing each PMB transformed intensity values using the MMA transformed intensities for all SNPs from the plurality that are called AA in the sample to obtain normalized PMB intensities;
  
  (e) using a plurality of reference samples, identify a set of PMA probes and a set of PMB probes for each SNP in the plurality of SNPs that show linear correlation between copy number and intensity;
  
  (f) calculating for each SNP in the plurality of SNPs an average of the PMA probes in the set of PMA probes and an average of the PMB probes in the set of PMB probes to obtain a PMA average intensity and a PMB average intensity for each SNP in the plurality of SNPs;
  
  (g) performing linear regression against a model equation derived from a plurality of reference samples to obtain an estimated A allele copy number and an estimated B allele copy number for each SNP in the plurality of SNPs;
  
  (h) adding the estimated A allele copy number to the estimated B allele copy number to obtain an estimated total copy number of the genomic region of each SNP in the plurality of SNPs, thereby calculating an estimated total copy number for each of a plurality of genomic regions in a genome; and
  
  (i) applying regression tree analysis to the estimated total copy numbers obtained in (h) to partition the genome into genomic regions having the same estimated total copy number, wherein steps (b)-(i) are performed by a computer and wherein the computer outputs the estimated total copy number of a plurality of genomic regions in a computer readable format.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1 wherein the high density genotyping array comprises a plurality of probe sets comprising at least 100,000 different probe sets, wherein a probe set comprises at least three perfect match probes for allele A, at least 3 perfect match probes for allele B, at least 3 mismatch probes for allele A and at least 3 mismatch probes for allele B.
  - 3. The method of claim 2 wherein a probe set comprises at least 7 perfect match probes for allele A, at least 7 perfect match probes for allele B, at least 7 mismatch probes for allele A and at least 7 mismatch probes for allele B.
  - 4. The method of claim 1 wherein probes are selected for the set of PMA probes and for the set of PMB probes by identifying SNPs that show a correlation greater than 0.6 between allelic dosages based on genotype calls and probe intensity using the equation
  - 5. The method of claim 1 wherein the step of estimating the A allele copy number and the B allele copy number in step (g) further comprises calculating a value for C in the following equation
    I_m=α
    - _m,0+α
      
      _m,1ln(δ
      
      _m+C)+A_m+ε
      
      for each SNP in the plurality of SNPs where m=1, . . . , M is the SNP index, I is the probe intensity, α
      
      _m,0is the SNP-specific optical background, α
      
      _m,1is the scaling factor, δ
      
      _mis the non-specific hybridization, T is the DNA target concentration, A_mis an affinity term determined by probe and target fragment sequences, and ε
      
      is a random noise term.
  - 6. The method of claim 1 wherein the step of estimating the copy number of the A allele in the unknown sample in step (g) further comprises using the equation:
    - Ĉ
      
      _a,lm=max(exp({circumflex over (γ
      
      )}_a0,m+{circumflex over (γ
      
      )}_a1,mI_a,lm)−
      
      {circumflex over (δ
      
      )}_a,lm,0),and the step of estimating the copy number of the B allele in the unknown sample in step (g) further comprises using the equation;
      
      Ĉ
      
      _b,lm=max(exp({circumflex over (γ
      
      )}_b0,m+{circumflex over (γ
      
      )}_b1,mI_b,lm)−
      
      {circumflex over (δ
      
      )}_b,lm,0),wherein δ
      
      is fixed and γ
      
      _ao,m, γ
      
      _a1,m, γ
      
      _bo,m, γ
      
      _b1,m, are estimated using the least square regression with the normal reference as the training set.
  - 7. The method of claim 6 further comprising performing regression tree analysis to partition the genome further based on allele-specific copy number for a plurality of genomic regions into regions that share the same allele-specific copy number and to assign allele-specific copy number to regions that show alteration from the diploid state.

8. A method for estimating the copy number of a genomic region in an experimental sample comprising:
- (a) isolating nucleic acid from the experimental sample;
  
  (b) fragmenting the nucleic acid sample with a restriction enzyme;
  
  (c) ligating an adaptor to the fragments(d) amplifying at least some of the adaptor ligated fragments(e) labeling the amplified products;
  
  (f) hybridizing the labeled amplified products to an array to obtain a hybridization pattern, wherein the array comprises a plurality of genotyping probe sets for a plurality of SNPs, wherein a probe set comprises;
  
  (i) a plurality of perfect match probes to a first allele of a SNP,(ii) a plurality of perfect match probes to a second allele of the SNP,(iii) a plurality of mismatch probes to the first allele of the SNP, and(iv) a plurality of mismatch probes to the second allele of the SNP,(g) obtaining a raw intensity measurement for each perfect match and each mismatch probe in each probe set for each SNP;
  
  (h) calculating the natural log(ln) of the raw intensity measurement for each probe;
  
  (i) standardizing the natural log of the raw intensity measurement for each probe using as background the mismatch probe intensities from the opposite allele in SNPs with a genotype call homozygous for the opposite allele;
  
  (j) obtaining a standardized measurement for each perfect match probe by a method comprising obtaining a first background intensity by calculating an average of a plurality of B allele mismatch probes for a plurality of SNPs called homozygous A in the sample, obtaining a second background intensity by calculating an average of a plurality of A allele mismatch probes for a plurality of SNPs called homozygous B in the sample,(k) standardize the PMa probes so that the MMa probes for SNPs with BB genotype calls have a variance of one and a mean of zero;
  
  (l) standardize the PMb probes so that the mismatch B probes for homozygous A SNPs have a variance of one and a mean of zero;
  
  (m) select probes to be included in calculation by identifying probes that show a linear response between copy number and intensity above a threshold and calculate an average intensity across selected probes in a probe set;
  
  (n) perform regression analysis on the reference set mean intensities for a given probe set and genotype for each SNP;
  
  (o) compare the intensity of the target sample against the mean intensity values of samples from the reference set with the same genotype call;
  
  (p) apply a linear regression to adjust the target intensity so that it falls on the line Y+X, perform separately on PMA and PMB probe intensities;
  
  (q) model copy number from the reference samples using the following equations
  ln(C_a,rm+δ
  
  _a,m)=γ
  
  _a1,mI_a,rm+ε
  
  _a,rm
  ln(C_b,rm+δ
  
  _b,m)=γ
  
  _b0,m+γ
  
  _b1,mI_b,rm+ε
  
  _b,rm(r) use the values of γ
  
  _ao,m, γ
  
  _a1,m, γ
  
  _bo,m, γ
  
  _b1,m, obtained in (q) to estimate copy number of the unknown sample for each allele of each SNP using the following equation
  Ĉ
  
  _a,lm=max(exp({circumflex over (γ
  
  )}_a0,m+{circumflex over (γ
  
  )}_a1,mI_a,lm)−
  
  {circumflex over (δ
  
  )}_a,lm,0)
  and
  Ĉ
  
  _b,lm=max(exp({circumflex over (γ
  
  )}_b0,m+{circumflex over (γ
  
  )}_b1,mI_b,lm)−
  
  {circumflex over (δ
  
  )}_b,lm,0)(s) perform kernel smoothing on the estimated copy number applying significance with a 1 Mb window and Gaussian kernel.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Affymetrix, Inc. (Thermo Fisher Scientific Incorporated)
Original Assignee
Affymetrix, Inc. (Thermo Fisher Scientific Incorporated)
Inventors
Jones, Keith W., Huang, Jing, Shapero, Michael H.
Primary Examiner(s)
Zeman; Mary K

Application Number

US11/295,225
Publication Number

US 20060134674A1
Time in Patent Office

1,786 Days
Field of Search

702 19- 20, 703/11, 707/102, 435/6, 536/24.5
US Class Current

702/19
CPC Class Codes

C12Q 1/6827   for detection of mutation o...

C12Q 1/6837   using probe arrays or probe...

C12Q 2545/101   with an internal standard/c...

G16B 20/00   ICT specially adapted for f...

G16B 20/10   Ploidy or copy number detec...

G16B 20/20   Allele or variant detection...

G16B 20/40   Population genetics; Linkag...

G16B 25/00   ICT specially adapted for h...

G16B 25/10   Gene or protein expression ...

G16B 25/20   Polymerase chain reaction [...

Methods for identifying DNA copy number changes

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

173 Citations

8 Claims

Specification

Solutions

Use Cases

Quick Links

Methods for identifying DNA copy number changes

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

173 Citations

8 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links