Method for improving the sensitivity of detection in determining copy number variations
First Claim
1. A method, implemented at a computer system that includes one or more processors and system memory, for evaluation of copy number of a nucleic acid sequence of interest in a test sample, the method comprising:
- (a) providing, at the computer system, at least 10,000 sequence reads obtained by a nucleic acid sequencer from the test sample, which test sample comprises nucleic acid molecules from one or more genomes;
(b) aligning, by the computer system, the at least 10,000 sequence reads of the test sample to a reference genome comprising the nucleic acid sequence of interest, thereby providing test sequence tags;
(c) determining, by the computer system, a coverage of the test sequence tags located in each bin, wherein each chromosome of the reference genome is divided into a plurality of bins, and wherein the coverage indicates a quantity of sequence tags in a bin;
(d) providing, by the computer system, a global profile for the nucleic acid sequence of interest, wherein the global profile comprises an expected coverage in each bin, and wherein the expected coverage is obtained from a training set of training samples unaffected by a copy number variation of the nucleic acid sequence of interest, the expected coverage exhibiting variation from bin to bin;
(e) adjusting, by the computer system, the coverage of the test sequence tags in each bin of at least the nucleic acid sequence of interest using the expected coverage in each bin, thereby obtaining global-profile-corrected coverages for the nucleic acid sequence of interest;
(f) adjusting, by the computer system, the global-profile-corrected coverages based on a relation between GC content levels of the test sample and the global-profile-corrected coverages of the test sample, thereby obtaining sample-GC-corrected coverages for the nucleic acid sequence of interest, wherein the adjusting is not based on GC-coverage relations of samples other than the test sample; and
(g) evaluating, by the computer system, a copy number of the nucleic acid sequence of interest in the test sample based on the sample-GC-corrected coverages, wherein the sample-GC-corrected coverages improve a signal level and/or reduce a noise level for determining the copy number of the nucleic acid sequence of interest.
1 Assignment
0 Petitions
Accused Products
Abstract
Disclosed are methods for determining copy number variation (CNV) known or suspected to be associated with a variety of medical conditions. In some embodiments, methods are provided for determining copy number variation (CNV) of fetuses using maternal samples comprising maternal and fetal cell free DNA. In some embodiments, methods are provided for determining CNVs known or suspected to be associated with a variety of medical conditions. Some embodiments disclosed herein provide methods to improve the sensitivity and/or specificity of sequence data analysis by removing within-sample GC-content bias. In some embodiments, removal of within-sample GC-content bias is based on sequence data corrected for systematic variation common across unaffected training samples. Also disclosed are systems and computer program products for evaluation of CNV of sequences of interest.
42 Citations
39 Claims
-
1. A method, implemented at a computer system that includes one or more processors and system memory, for evaluation of copy number of a nucleic acid sequence of interest in a test sample, the method comprising:
-
(a) providing, at the computer system, at least 10,000 sequence reads obtained by a nucleic acid sequencer from the test sample, which test sample comprises nucleic acid molecules from one or more genomes; (b) aligning, by the computer system, the at least 10,000 sequence reads of the test sample to a reference genome comprising the nucleic acid sequence of interest, thereby providing test sequence tags; (c) determining, by the computer system, a coverage of the test sequence tags located in each bin, wherein each chromosome of the reference genome is divided into a plurality of bins, and wherein the coverage indicates a quantity of sequence tags in a bin; (d) providing, by the computer system, a global profile for the nucleic acid sequence of interest, wherein the global profile comprises an expected coverage in each bin, and wherein the expected coverage is obtained from a training set of training samples unaffected by a copy number variation of the nucleic acid sequence of interest, the expected coverage exhibiting variation from bin to bin; (e) adjusting, by the computer system, the coverage of the test sequence tags in each bin of at least the nucleic acid sequence of interest using the expected coverage in each bin, thereby obtaining global-profile-corrected coverages for the nucleic acid sequence of interest; (f) adjusting, by the computer system, the global-profile-corrected coverages based on a relation between GC content levels of the test sample and the global-profile-corrected coverages of the test sample, thereby obtaining sample-GC-corrected coverages for the nucleic acid sequence of interest, wherein the adjusting is not based on GC-coverage relations of samples other than the test sample; and (g) evaluating, by the computer system, a copy number of the nucleic acid sequence of interest in the test sample based on the sample-GC-corrected coverages, wherein the sample-GC-corrected coverages improve a signal level and/or reduce a noise level for determining the copy number of the nucleic acid sequence of interest. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38)
-
-
39. A system for evaluation of copy number of a nucleic acid sequence of interest in a test sample, the system comprising:
-
a sequencer for receiving nucleic acids from the test sample providing nucleic acid sequence information from the sample; one or more processors; and one or more computer-readable storage media having stored thereon instructions for execution on said processor to evaluate copy number in the test sample using a method comprising; (a) providing, at the system, at least 10,000 sequence reads of the test sample; (b) aligning, by the one or more processors, the at least 10,000 sequence reads of the test sample to a reference genome comprising the nucleic acid sequence of interest, thereby providing test sequence tags; (c) determining, by the one or more processors, a coverage of the test sequence tags located in each bin, wherein each chromosome of the reference genome is divided into a plurality of bins; (d) providing, by the one or more processors, a global profile for the nucleic acid sequence of interest, wherein the global profile comprises an expected coverage in each bin, and wherein the expected coverage is obtained from a training set of training samples unaffected by a copy number variation of the nucleic acid sequence of interest, the expected coverage exhibiting variation from bin to bin; (e) adjusting, by the one or more processors, the coverage of the test sequence tags in each bin of at least the nucleic acid sequence of interest according to the expected coverage in each bin, thereby obtaining a global-profile-corrected coverage in each bin of the test sequence tags; (f) adjusting, by the one or more processors, the global-profile-corrected coverages based on a relation between GC content level of the test sample and the global-profile-corrected coverage of the test sample for the bins of the test sequence tags, thereby obtaining a sample-GC-corrected coverage of the test sequence tags on the nucleic acid sequence of interest, wherein the adjusting is not based on GC-coverage relations of samples other than the test sample; and (g) evaluating, by the one or more processors, copy number of the nucleic acid sequence of interest in the test sample based on the sample-GC-corrected coverage.
-
Specification