Methods for estimating genome-wide copy number variations
First Claim
1. A method for determining copy number variation of a genomic region at a detection position of a target polynucleotide sequence in a sample, said method comprising:
- obtaining, using a computer, coverage values for each given position in a baseline or reference sample for the sequence coverage of said target polynucleotide using data generated from mate-pair mappings;
correcting, using the computer, the coverage values for each given position for sequence coverage bias, wherein correcting the coverage values for each given position in a baseline or reference sample comprises performing ploidy-aware baseline correction; and
estimating, using the computer, a total copy number value and region-specific copy number value for each of a plurality of genomic regions based at least on the corrected coverage values for each given position in a baseline or reference sample.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods for determining the copy number of a genomic region at a detection position of a target sequence in a sample are disclosed. Genomic regions of a target sequence in a sample are sequenced and measurement data for sequence coverage is obtained. Sequence coverage bias is corrected and may be normalized against a baseline sample. Hidden Markov Model (HMM) segmentation, scoring, and output are performed, and in some embodiments population-based no-calling and identification of low-confidence regions may also be performed. A total copy number value and region-specific copy number value for a plurality of regions are then estimated.
-
Citations
27 Claims
-
1. A method for determining copy number variation of a genomic region at a detection position of a target polynucleotide sequence in a sample, said method comprising:
-
obtaining, using a computer, coverage values for each given position in a baseline or reference sample for the sequence coverage of said target polynucleotide using data generated from mate-pair mappings; correcting, using the computer, the coverage values for each given position for sequence coverage bias, wherein correcting the coverage values for each given position in a baseline or reference sample comprises performing ploidy-aware baseline correction; and estimating, using the computer, a total copy number value and region-specific copy number value for each of a plurality of genomic regions based at least on the corrected coverage values for each given position in a baseline or reference sample. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A non-transitory computer-readable medium comprising instructions tangibly embodied thereon, the instructions when executed by a computer processor causing the processor to perform the operations of:
-
obtaining, using the processor, coverage values for each given position in a baseline or reference sample for the sequence coverage of said target polynucleotide using data generated from mate-pair mappings; correcting, using the processor, the coverage values for each given position for sequence coverage bias, wherein correcting the coverage values for each given position in a baseline or reference sample comprises performing ploidy-aware baseline correction; and estimating, using the processor, a total copy number value and region-specific copy number value for each of a plurality of genomic regions based at least on the corrected coverage values for each given position in a baseline or reference sample.
-
-
26. A non-transitory computer-readable medium comprising instructions tangibly embodied thereon, the instructions when executed by a computer processor causing the processor to perform the operations of:
-
obtaining, using the processor, coverage values for each given position in a baseline or reference sample for the sequence coverage of said target polynucleotide using data generated from mate-pair mappings; correcting, using the processor, the coverage values for each given position for sequence coverage bias, wherein correcting the coverage values for each given position in a baseline or reference sample comprises performing ploidy-aware baseline correction; and performing Hidden Markov Model (HMM) segmentation, scoring, and output based on the corrected coverage values for each given position in a baseline or reference sample; based on the HMM scoring and output, performing population-based no-calling and identification of low-confidence regions; and based on the HMM scoring and output, estimating a total copy number value and region-specific copy number value for a plurality of regions.
-
-
27. A system of determining copy number variation of a genomic region at a detection position of a target polynucleotide sequence, comprising:
-
a computer processor; and a computer-readable storage medium coupled to said processor, the storage medium having instructions tangibly embodied thereon, the instructions when executed by said processor causing said processor to perform the operations of; obtaining, using the processor, coverage values for each given position in a baseline or reference sample for the sequence coverage of said target polynucleotide using data generated from mate-pair mappings; correcting, using the processor, the coverage values for each given position for sequence coverage bias, wherein correcting the coverage values for each given position in a baseline or reference sample comprises performing ploidy-aware baseline correction; and estimating, using a computer, a total copy number value and region-specific copy number value for each of a plurality of genomic regions based at least on the corrected coverage values for each given position in a baseline or reference sample.
-
Specification