Determination of copy number variations using binomial probability calculations
First Claim
Patent Images
1. A computer-implemented process for determining copy number variation (CNV) of a one or more genomic regions in a single source in a mixed sample, wherein at least one processor coupled to a memory executes a software component that performs the process comprising:
- accessing by the software component a first data set comprising frequency data based on identification of distinguishing regions of two or more informative loci from a first source in the single source in the mixed sample;
accessing by the software component a second data set comprising frequency data based on identification of distinguishing regions of two or more informative loci from a second source in the single source in the mixed sample;
calculating by the software component an estimated source contribution of cell free nucleic acids based on a binomial distribution of counts of the distinguishing regions from first and second data sets;
accessing by the software component a third data set comprising frequency data for one or more genomic regions from the single source in the mixed sample; and
calculating by the software component a presence or absence of a CNV for the one or more genomic regions by comparison of the frequency data from the single source to the estimated contribution of cell free nucleic acids in the mixed sample.
3 Assignments
0 Petitions
Accused Products
Abstract
This invention relates to a binomial calculation of copy number of data obtained from a mixed sample having a first source and a second source.
202 Citations
7 Claims
-
1. A computer-implemented process for determining copy number variation (CNV) of a one or more genomic regions in a single source in a mixed sample, wherein at least one processor coupled to a memory executes a software component that performs the process comprising:
-
accessing by the software component a first data set comprising frequency data based on identification of distinguishing regions of two or more informative loci from a first source in the single source in the mixed sample; accessing by the software component a second data set comprising frequency data based on identification of distinguishing regions of two or more informative loci from a second source in the single source in the mixed sample; calculating by the software component an estimated source contribution of cell free nucleic acids based on a binomial distribution of counts of the distinguishing regions from first and second data sets; accessing by the software component a third data set comprising frequency data for one or more genomic regions from the single source in the mixed sample; and calculating by the software component a presence or absence of a CNV for the one or more genomic regions by comparison of the frequency data from the single source to the estimated contribution of cell free nucleic acids in the mixed sample. - View Dependent Claims (2)
-
-
3. A computer-implemented process for determining copy number variation (CNV) of one or more genomic regions a single source in a mixed sample, wherein at least one processor coupled to a memory executes a software component that performs the process, the process comprising:
-
accessing by the software component a first data set comprising frequency data for one or more informative loci from a maternal source in the mixed sample; accessing by the software component a second data set comprising frequency data for one or more informative loci from a fetal source in the mixed sample; calculating by the software component an estimated fetal source contribution of cell free nucleic based on a binomial distribution of the counts of distinguishing regions from first and second data sets; and accessing by the software component a third data set comprising frequency data for one or more genomic regions from the single source in the mixed sample; and calculating by the software component the presence or absence of a CNV for the one or more genomic regions in the fetus by comparison of the frequency of the genomic regions from the single source to the estimated fetal source contribution of cell free nucleic acids in the mixed sample.
-
-
4. An executable software product stored on a non-transitory computer-readable medium containing program instructions, which when executed by a computer directs performance of steps for estimating copy number variation (CNV) of one or more genomic regions in a mixed sample, the steps comprising:
-
accessing by the software component a first data set comprising frequency data based on identification of distinguishing regions from copies of one or more informative loci from a first source; accessing by the software component a second data set comprising frequency data based on identification of distinguishing regions from copies of one or more informative loci from a second source; calculating by the software component an estimated source contribution of cell free nucleic acids based on a binomial distribution of the first and second data sets; and calculating by the software component the CNV for the one or more genomic regions by comparison of the at least one of the first source and the second source counts for the genomic region to the estimated source contribution of cell free nucleic acids from the at least one of the first source and the second source. - View Dependent Claims (5)
-
-
6. A system, comprising:
-
a memory; a processor coupled to the memory; and a software component executed by the processor that is configured to; access a first data set comprising frequency data based on identification of distinguishing regions from copies of one or more informative loci from a first source in a mixed sample; access a second data set comprising frequency data based on identification of distinguishing regions from copies of one or more informative loci from a second source in the mixed sample; calculate an estimated contribution of cell free nucleic acids from at least one of the first source and the second source based on a binomial distribution of counts of the distinguishing regions from the first and second data sets; and calculate a copy number variation for one or more genomic regions of a single source in the mixed sample by comparison of the frequency data for the one or more genomic regions to the estimated source contribution of cell free nucleic acids in the mixed sample.
-
-
7. A computer software product including a non-transitory computer-readable storage medium having fixed therein a sequence of instructions which when executed by a computer directs performance of steps of:
-
creating a first data set representing a quantity of informative loci from a first source in a mixed sample; creating a second data set representing a quantity of informative loci from a second source in a the mixed sample; calculating an estimated source contribution of cell free nucleic acids from the first source and the second source in the mixed sample based on a binomial distribution of the quantities of informative loci from the first and second data sets; and calculating a presence or absence of a copy number variation for a genomic region by comparison of the quantity of one or more informative loci from the first and second data sets to the estimated source contribution of cell free nucleic acids in the mixed sample.
-
Specification