Using cell-free DNA fragment size to determine copy number variations

US 10,095,831 B2
Filed: 12/16/2016
Issued: 10/09/2018
Est. Priority Date: 02/03/2016
Status: Active Grant

First Claim

Patent Images

1. A method, implemented using a computer system comprising one or more processors and system memory, for determining a copy number variation (CNV) of a nucleic acid sequence of interest in a test sample comprising cell-free nucleic acid fragments originating from two or more genomes, the method comprising:

(a) receiving, by the computer system, sequence reads obtained by sequencing the cell-free nucleic acid fragments in the test sample;

(b) aligning, by the one or more processors, the sequence reads of the cell-free nucleic acid fragments or aligning fragments containing the sequence reads to bins of a reference genome comprising the sequence of interest, thereby providing test sequence tags, wherein the reference genome is divided into a plurality of bins;

(c) determining fragment sizes of at least some of the cell-free nucleic acid fragments present in the test sample;

(d) for cell-free nucleic acid fragments determined as being in a first size domain, calculating, by the one or more processors, first coverages of the sequence tags for the bins of the reference genome by, for each bin;

(i) determining a number of sequence tags aligning to the bin, and(ii) normalizing the number of sequence tags aligning to the bin by accounting for bin-to-bin variations due to factors other than copy number variation;

(e) for cell-free nucleic acid fragments determined as being in a second size domain, calculating, by the one or more processors, second coverages of the sequence tags for the bins of the reference genome by, for each bin;

(i) determining a number of sequence tags aligning to the bin, and(ii) normalizing the number of sequence tags aligning to the bin by accounting for bin-to-bin variations due to factors other than copy number variation; and

(f) determining a copy number variation in the sequence of interest using a likelihood ratio calculated from the first coverages and the second coverages.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed are methods for determining copy number variation (CNV) known or suspected to be associated with a variety of medical conditions. In some embodiments, methods are provided for determining copy number variation of fetuses using maternal samples comprising maternal and fetal cell free DNA. In some embodiments, methods are provided for determining CNVs known or suspected to be associated with a variety of medical conditions. Some embodiments disclosed herein provide methods to improve the sensitivity and/or specificity of sequence data analysis by deriving a fragment size parameter. In some implementations, information from fragments of different sizes are used to evaluate copy number variations. In some implementations, one or more t-statistics obtained from coverage information of the sequence of interest is used to evaluate copy number variations. In some implementations, one or more fetal fraction estimates are combined with one or more t-statistics to determine copy number variations.

Citations

33 Claims

1. A method, implemented using a computer system comprising one or more processors and system memory, for determining a copy number variation (CNV) of a nucleic acid sequence of interest in a test sample comprising cell-free nucleic acid fragments originating from two or more genomes, the method comprising:
- (a) receiving, by the computer system, sequence reads obtained by sequencing the cell-free nucleic acid fragments in the test sample;
  
  (b) aligning, by the one or more processors, the sequence reads of the cell-free nucleic acid fragments or aligning fragments containing the sequence reads to bins of a reference genome comprising the sequence of interest, thereby providing test sequence tags, wherein the reference genome is divided into a plurality of bins;
  
  (c) determining fragment sizes of at least some of the cell-free nucleic acid fragments present in the test sample;
  
  (d) for cell-free nucleic acid fragments determined as being in a first size domain, calculating, by the one or more processors, first coverages of the sequence tags for the bins of the reference genome by, for each bin;
  
  (i) determining a number of sequence tags aligning to the bin, and(ii) normalizing the number of sequence tags aligning to the bin by accounting for bin-to-bin variations due to factors other than copy number variation;
  
  (e) for cell-free nucleic acid fragments determined as being in a second size domain, calculating, by the one or more processors, second coverages of the sequence tags for the bins of the reference genome by, for each bin;
  
  (i) determining a number of sequence tags aligning to the bin, and(ii) normalizing the number of sequence tags aligning to the bin by accounting for bin-to-bin variations due to factors other than copy number variation; and
  
  (f) determining a copy number variation in the sequence of interest using a likelihood ratio calculated from the first coverages and the second coverages.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 26, 27, 28, 29, 30, 31, 32, 33)
- - 2. The method of claim 1, wherein the likelihood ratio is calculated from a t-statistic of the first coverages and a t-statistic of the second coverages, wherein the t-statistic is calculated using coverages of bins in the sequence of interest and coverages of bins in a reference region for the sequence of interest.
  - 3. The method of claim 1, wherein the first size domain comprises cell-free nucleic acid fragments of substantially all sizes in the test sample, and the second size domain comprises only cell-free nucleic acid fragments smaller than a defined size.
  - 4. The method of claim 1, wherein the second size domain comprises only the cell-free nucleic acid fragments smaller than about 150 bp.
  - 5. The method of claim 1, wherein the likelihood ratio is calculated as a first likelihood that the test sample is an aneuploid sample over a second likelihood that the test sample is a euploid sample.
  - 6. The method of claim 1, wherein the likelihood ratio is calculated from one or more values of fetal fraction in addition to the first coverages and the second coverages.
  - 7. The method of claim 6, wherein the one or more values of fetal fraction comprise a value of fetal fraction calculated using information about the sizes of the cell-free nucleic acid fragments.
  - 8. The method of claim 7, wherein the value of fetal fraction is calculated by:
    - obtaining a frequency distribution of the sizes of the cell-free nucleic acid fragments; and
      
      applying the frequency distribution to a model relating fetal fraction to frequency of fragment size to obtain the fetal fraction value.
  - 9. The method of claim 6, wherein the one or more values of fetal fraction comprise a value of fetal fraction calculated using coverage information for the bins of the reference genome.
  - 10. The method of claim 9, wherein the value of fetal fraction is calculated by:
    - applying coverage values of a plurality of bins to a model relating fetal fraction to coverage of bin to obtain the fetal fraction value.
  - 11. The method of claim 6, wherein the one or more values of fetal fraction comprise a value of fetal fraction calculated using coverage information for the bins of a sex chromosome.
  - 12. The method of claim 6, wherein the likelihood ratio is calculated from a fetal fraction, a t-statistic of short fragments, and a t-statistic of all fragments, wherein the short fragments are cell-free nucleic acid fragments in a first size range smaller than a criterion size, and the all fragments are cell-free nucleic acid fragments including the short fragments and fragments longer than the criterion size.
  - 13. The method of claim 12, wherein the likelihood ratio is calculated:
  - 14. The method of claim 1, wherein the likelihood ratio is calculated for monosomy X, trisomy X, trisomy 13, trisomy 18, or trisomy 21.
  - 15. The method of claim 1, wherein normalizing the number of sequence tags comprises:
    - normalizing for GC content of the test sample, normalizing for a global wave profile of variation of a training set, and/or normalizing for one or more components obtained from a principal component analysis.
  - 16. The method of claim 2, wherein the reference region is selected from the group consisting of:
    - all robust chromosomes, robust chromosomes not including the sequence of interest, at least a chromosome outside of the sequence of interest, and a subset of chromosomes selected from the robust chromosomes, wherein the robust chromosomes are autosomal chromosomes other than chromosomes 13, 18, and 21.
  - 17. The method of claim 16, wherein the reference region comprises robust chromosomes that have been determined to provide the best signal detection ability for a set of training samples.
  - 18. The method of claim 2, further comprising:
    - calculating values of a size parameter for the bins by, for each bin;
      
      (i) determining a value of the size parameter from sizes of cell-free nucleic acid fragments in the bin, and(ii) normalizing the value of the size parameter by accounting for bin-to-bin variations due to factors other than copy number variation; and
      
      determining a size-based t-statistic for the sequence of interest using values of the size parameter of bins in the sequence of interest and values of the size parameter of bins in the reference region for the sequence of interest.
  - 19. The method of claim 18, wherein the likelihood ratio of (f) is calculated from the first t-statistic, the second t-statistic, and the size-based t-statistic.
  - 20. The method of claim 18, wherein the likelihood ratio of (f) is calculated from the size-based t-statistic and a fetal fraction.
  - 21. The method of claim 1, further comprising comparing the likelihood ratio to a call criterion to determine a copy number variation in the sequence of interest.
  - 22. The method of claim 1, further comprising obtaining a plurality of likelihood ratios and applying the plurality of likelihood ratios to a decision tree to determine a ploidy case for the test sample.
  - 26. The method of claim 2, wherein the t-statistic is calculated as follows:
  - 27. The method of claim 1, further comprising, before (a),extracting the cell-free nucleic acid fragments in the test sample from a plasma sample of a pregnant female carrying a fetus, wherein the cell-free nucleic acid fragments in the test sample comprise nucleic acid originating from the fetus and nucleic acid originating from the pregnant female;
    - andsequencing the cell-free nucleic acid fragments to obtain the sequence reads.
  - 28. The method of claim 27, further comprising:
    - determining that the fetus is affected by a genetic abnormality associated with the copy number variation in the sequence of interest.
  - 29. The method of claim 28, further comprising:
    - prescribing, initiating, and/or altering a treatment regimen, wherein the treatment regimen is designed to treat the genetic abnormality affecting the fetus.
  - 30. The method of claim 1, further comprising, before (a),extracting the cell-free nucleic acid fragments in the test sample from an individual, wherein the cell-free nucleic acid fragments comprise nucleic acid originating from cancer cells;
    - andsequencing the cell-free nucleic acid fragments to obtain the sequence reads.
  - 31. The method of claim 30, further comprising:
    - determining that the individual is affected by a cancer associated with the copy number variation in the sequence of interest.
  - 32. The method of claim 31, further comprising:
    - prescribing, initiating, and/or altering a treatment regimen, wherein the treatment regimen is designed to treat the cancer affecting the individual.
  - 33. The method of claim 30, wherein the cell-free nucleic acid fragments in the test sample is extracted from a plasma sample of the individual.

23. A system for evaluation of copy number of a nucleic acid sequence of interest in a test sample, the system comprising:
- a sequencer for receiving cell-free nucleic acid fragments from the test sample and providing nucleic acid sequence information of the test sample;
  
  a processor; and
  
  one or more computer-readable storage media having stored thereon instructions for execution on said processor to;
  
  (a) receive sequence reads obtained by sequencing the cell-free nucleic acid fragments in the test sample;
  
  (b) align the sequence reads of the cell-free nucleic acid fragments or aligning fragments containing the sequence reads to bins of a reference genome comprising the sequence of interest, thereby providing test sequence tags, wherein the reference genome is divided into a plurality of bins;
  
  (c) determine fragment sizes of at least some of the cell-free nucleic acid fragments present in the test sample;
  
  (d) for cell-free nucleic acid fragments determined as being in a first size domain, calculate first coverages of the sequence tags for the bins of the reference genome by, for each bin;
  
  (i) determining a number of sequence tags aligning to the bin, and(ii) normalizing the number of sequence tags aligning to the bin by accounting for bin-to-bin variations due to factors other than copy number variation;
  
  (e) for cell-free nucleic acid fragments determined as being in a second size domain, calculate second coverages of the sequence tags for the bins of the reference genome by, for each bin;
  
  (i) determining a number of sequence tags aligning to the bin, and(ii) normalizing the number of sequence tags aligning to the bin by accounting for bin-to-bin variations due to factors other than copy number variation; and
  
  (f) determine a copy number variation in the sequence of interest using a likelihood ratio calculated from the first coverages and the second coverages.

24. A method for determining a copy number variation (CNV) of a nucleic acid sequence of interest in a test sample comprising cell-free nucleic acid fragments originating from two or more genomes, the method comprising:
- (a) receiving sequence reads obtained by sequencing the cell-free nucleic acid fragments in the test sample;
  
  (b) aligning the sequence reads of the cell-free nucleic acid fragments or aligning fragments containing the sequence reads to bins of a reference genome comprising the sequence of interest, thereby providing test sequence tags, wherein the reference genome is divided into a plurality of bins;
  
  (c) determining fragment sizes of the cell-free nucleic acid fragments existing in the test sample;
  
  (d) calculating coverages of the sequence tags for the bins of the reference genome using sequence tags for the cell-free nucleic acid fragments having sizes in a first size domain;
  
  (e) calculating coverages of the sequence tags for the bins of the reference genome using sequence tags for the cell-free nucleic acid fragments having sizes in a second size domain, wherein the second size domain is different from the first size domain;
  
  (f) calculating size characteristics for the bins of the reference genome using the fragment sizes determined in (c); and
  
  (g) determining a copy number variation in the sequence of interest using the coverages calculated in (d) and (e) and the size characteristics calculated in (f).
- View Dependent Claims (25)
- - 25. The method of claim 24, wherein (g) comprises calculating a t-statistic for the sequence of interest using the size characteristics of bins in the sequence of interest calculated in (f).

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Verinata Health Incorporated (Illumina Incorporated)
Original Assignee
Verinata Health Incorporated (Illumina Incorporated)
Inventors
Duenwald, Sven, Comstock, David A., Barbacioru, Catalin, Chudova, Darya I., Rava, Richard P., Jones, Keith W., Chen, Gengxin, Skvortsov, Dimitri
Primary Examiner(s)
Brusca, John S

Application Number

US15/382,508
Publication Number

US 20170220735A1
Time in Patent Office

662 Days
Field of Search

None
US Class Current
CPC Class Codes

C12Q 1/6869   Methods for sequencing

C12Q 1/6883   for diseases caused by alte...

C12Q 2537/16   Assays for determining copy...

C12Q 2537/165   Mathematical modelling, e.g...

C12Q 2600/154   Methylation markers

C12Q 2600/156   Polymorphic or mutational m...

G16B 20/00   ICT specially adapted for f...

G16B 20/10   Ploidy or copy number detec...

G16B 20/20   Allele or variant detection...

G16B 25/00   ICT specially adapted for h...

G16B 30/00   ICT specially adapted for s...

G16B 30/10   Sequence alignment; Homolog...

G16B 40/00   ICT specially adapted for b...

G16B 40/10   Signal processing, e.g. fro...

G16H 10/40   for data related to laborat...

G16H 20/10   relating to drugs or medica...

G16H 50/20   for computer-aided diagnosi...

G16H 50/30   for calculating health indi...

G16Z 99/00   Subject matter not provided...

Using cell-free DNA fragment size to determine copy number variations

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

33 Claims

Specification

Solutions

Use Cases

Quick Links

Using cell-free DNA fragment size to determine copy number variations

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

33 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links