Using cell-free DNA fragment size to determine copy number variations
First Claim
1. A method, implemented using a computer system comprising one or more processors and system memory, for determining a copy number variation (CNV) of a nucleic acid sequence of interest in a test sample comprising cell-free nucleic acid fragments originating from two or more genomes, the method comprising:
- (a) receiving, by the computer system, sequence reads obtained by sequencing the cell-free nucleic acid fragments in the test sample;
(b) aligning, by the one or more processors, the sequence reads of the cell-free nucleic acid fragments or aligning fragments containing the sequence reads to bins of a reference genome comprising the sequence of interest, thereby providing test sequence tags, wherein the reference genome is divided into a plurality of bins;
(c) determining fragment sizes of at least some of the cell-free nucleic acid fragments present in the test sample;
(d) for cell-free nucleic acid fragments determined as being in a first size domain, calculating, by the one or more processors, first coverages of the sequence tags for the bins of the reference genome by, for each bin;
(i) determining a number of sequence tags aligning to the bin, and(ii) normalizing the number of sequence tags aligning to the bin by accounting for bin-to-bin variations due to factors other than copy number variation;
(e) for cell-free nucleic acid fragments determined as being in a second size domain, calculating, by the one or more processors, second coverages of the sequence tags for the bins of the reference genome by, for each bin;
(i) determining a number of sequence tags aligning to the bin, and(ii) normalizing the number of sequence tags aligning to the bin by accounting for bin-to-bin variations due to factors other than copy number variation; and
(f) determining a copy number variation in the sequence of interest using a likelihood ratio calculated from the first coverages and the second coverages.
1 Assignment
0 Petitions
Accused Products
Abstract
Disclosed are methods for determining copy number variation (CNV) known or suspected to be associated with a variety of medical conditions. In some embodiments, methods are provided for determining copy number variation of fetuses using maternal samples comprising maternal and fetal cell free DNA. In some embodiments, methods are provided for determining CNVs known or suspected to be associated with a variety of medical conditions. Some embodiments disclosed herein provide methods to improve the sensitivity and/or specificity of sequence data analysis by deriving a fragment size parameter. In some implementations, information from fragments of different sizes are used to evaluate copy number variations. In some implementations, one or more t-statistics obtained from coverage information of the sequence of interest is used to evaluate copy number variations. In some implementations, one or more fetal fraction estimates are combined with one or more t-statistics to determine copy number variations.
-
Citations
33 Claims
-
1. A method, implemented using a computer system comprising one or more processors and system memory, for determining a copy number variation (CNV) of a nucleic acid sequence of interest in a test sample comprising cell-free nucleic acid fragments originating from two or more genomes, the method comprising:
-
(a) receiving, by the computer system, sequence reads obtained by sequencing the cell-free nucleic acid fragments in the test sample; (b) aligning, by the one or more processors, the sequence reads of the cell-free nucleic acid fragments or aligning fragments containing the sequence reads to bins of a reference genome comprising the sequence of interest, thereby providing test sequence tags, wherein the reference genome is divided into a plurality of bins; (c) determining fragment sizes of at least some of the cell-free nucleic acid fragments present in the test sample; (d) for cell-free nucleic acid fragments determined as being in a first size domain, calculating, by the one or more processors, first coverages of the sequence tags for the bins of the reference genome by, for each bin; (i) determining a number of sequence tags aligning to the bin, and (ii) normalizing the number of sequence tags aligning to the bin by accounting for bin-to-bin variations due to factors other than copy number variation; (e) for cell-free nucleic acid fragments determined as being in a second size domain, calculating, by the one or more processors, second coverages of the sequence tags for the bins of the reference genome by, for each bin; (i) determining a number of sequence tags aligning to the bin, and (ii) normalizing the number of sequence tags aligning to the bin by accounting for bin-to-bin variations due to factors other than copy number variation; and (f) determining a copy number variation in the sequence of interest using a likelihood ratio calculated from the first coverages and the second coverages. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 26, 27, 28, 29, 30, 31, 32, 33)
-
-
23. A system for evaluation of copy number of a nucleic acid sequence of interest in a test sample, the system comprising:
-
a sequencer for receiving cell-free nucleic acid fragments from the test sample and providing nucleic acid sequence information of the test sample; a processor; and one or more computer-readable storage media having stored thereon instructions for execution on said processor to; (a) receive sequence reads obtained by sequencing the cell-free nucleic acid fragments in the test sample; (b) align the sequence reads of the cell-free nucleic acid fragments or aligning fragments containing the sequence reads to bins of a reference genome comprising the sequence of interest, thereby providing test sequence tags, wherein the reference genome is divided into a plurality of bins; (c) determine fragment sizes of at least some of the cell-free nucleic acid fragments present in the test sample; (d) for cell-free nucleic acid fragments determined as being in a first size domain, calculate first coverages of the sequence tags for the bins of the reference genome by, for each bin; (i) determining a number of sequence tags aligning to the bin, and (ii) normalizing the number of sequence tags aligning to the bin by accounting for bin-to-bin variations due to factors other than copy number variation; (e) for cell-free nucleic acid fragments determined as being in a second size domain, calculate second coverages of the sequence tags for the bins of the reference genome by, for each bin; (i) determining a number of sequence tags aligning to the bin, and (ii) normalizing the number of sequence tags aligning to the bin by accounting for bin-to-bin variations due to factors other than copy number variation; and (f) determine a copy number variation in the sequence of interest using a likelihood ratio calculated from the first coverages and the second coverages.
-
-
24. A method for determining a copy number variation (CNV) of a nucleic acid sequence of interest in a test sample comprising cell-free nucleic acid fragments originating from two or more genomes, the method comprising:
-
(a) receiving sequence reads obtained by sequencing the cell-free nucleic acid fragments in the test sample; (b) aligning the sequence reads of the cell-free nucleic acid fragments or aligning fragments containing the sequence reads to bins of a reference genome comprising the sequence of interest, thereby providing test sequence tags, wherein the reference genome is divided into a plurality of bins; (c) determining fragment sizes of the cell-free nucleic acid fragments existing in the test sample; (d) calculating coverages of the sequence tags for the bins of the reference genome using sequence tags for the cell-free nucleic acid fragments having sizes in a first size domain; (e) calculating coverages of the sequence tags for the bins of the reference genome using sequence tags for the cell-free nucleic acid fragments having sizes in a second size domain, wherein the second size domain is different from the first size domain; (f) calculating size characteristics for the bins of the reference genome using the fragment sizes determined in (c); and (g) determining a copy number variation in the sequence of interest using the coverages calculated in (d) and (e) and the size characteristics calculated in (f). - View Dependent Claims (25)
-
Specification