METHOD AND SYSTEM FOR CALLING VARIATIONS IN A SAMPLE POLYNUCLEOTIDE SEQUENCE WITH RESPECT TO A REFERENCE POLYNUCLEOTIDE SEQUENCE
First Claim
1. A computer-implemented method for calling variations in a sample polynucleotide sequence with respect to a reference polynucleotide sequence, the method comprising:
- executing an application on at least one computer that locates local areas in the reference polynucleotide sequence where a likelihood exists that one or more bases of the sample polynucleotide sequence are changed from corresponding bases in the reference polynucleotide sequence, where the likelihood is determined at least in part based on mapped mated reads of the sample polynucleotide sequence;
generating at least one sequence hypothesis for each of the local areas, and optimizing the at least one sequence hypothesis for at least a portion of the local areas to find one or more optimized sequence hypotheses of high probability for the local areas; and
analyzing the optimized sequence hypotheses to identify a series of variation calls in the sample polynucleotide sequence.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments for calling variations in a sample polynucleotide sequence compared to a reference polynucleotide sequence are provided. Aspects of the embodiments include executing an application on at least one computer that locates local areas in the reference polynucleotide sequence where a likelihood exists that one or more bases of the sample polynucleotide sequence are changed from corresponding bases in the reference polynucleotide sequence, where the likelihood is determined at least in part based on mapped mated reads of the sample polynucleotide sequence; generating at least one sequence hypothesis for each of the local areas, and optimizing the at least one sequence hypothesis for at least a portion of the local areas to find one or more optimized sequence hypotheses of high probability for the local areas; and analyzing the optimized sequence hypotheses to identify a series of variation calls in the sample polynucleotide sequence.
-
Citations
37 Claims
-
1. A computer-implemented method for calling variations in a sample polynucleotide sequence with respect to a reference polynucleotide sequence, the method comprising:
-
executing an application on at least one computer that locates local areas in the reference polynucleotide sequence where a likelihood exists that one or more bases of the sample polynucleotide sequence are changed from corresponding bases in the reference polynucleotide sequence, where the likelihood is determined at least in part based on mapped mated reads of the sample polynucleotide sequence; generating at least one sequence hypothesis for each of the local areas, and optimizing the at least one sequence hypothesis for at least a portion of the local areas to find one or more optimized sequence hypotheses of high probability for the local areas; and analyzing the optimized sequence hypotheses to identify a series of variation calls in the sample polynucleotide sequence. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
-
-
28. A system, comprising:
-
a data repository that stores a reference polynucleotide sequence and mapped mated reads obtained from a sample polynucleotide sequence that are mapped to locations in the reference polynucleotide sequence; a computer cluster comprising a plurality of computers coupled to the data repository via a network; and a variation caller executing in parallel on the plurality of computers, the variation caller configured to; locate local areas in the sample polynucleotide sequence based on the mapped mated reads where one or more bases are likely to have changed from corresponding bases in the reference polynucleotide sequence; optimize a sequence hypothesis for each of the local areas to find a set of sequence hypotheses of high probability for each of the local areas; and analyze each of the sets of sequence hypotheses to identify a series of variation calls in the sample polynucleotide sequence.
-
-
29. An executable software product stored on a computer-readable medium containing program instructions for calling variations in mapped mated reads obtained from a sample polynucleotide sequence compared to a reference polynucleotide sequence, the program instructions for:
-
locating local areas in the sample polynucleotide sequence based on the mapped mated reads where one or more bases are likely to have changed from corresponding bases in the reference polynucleotide sequence; optimizing a sequence hypothesis for each of the local areas to find a set of sequence hypotheses of high probability for each of the local areas; and analyzing each of the sets of sequence hypotheses to statistically identify a series of variation calls in the sample polynucleotide sequence and storing the variation calls in a memory.
-
-
30. A system, comprising:
-
a data repository that stores a reference polynucleotide sequence and mapped mated reads obtained from a sample polynucleotide sequence that are mapped to locations in the reference polynucleotide sequence; a computer cluster comprising a plurality of computers coupled to the data repository via a network; and a variation caller executing in parallel on the plurality of computers, the variation caller configured to; perform statistical probability analysis on the reference polynucleotide sequence and on the mapped mated reads based in part on a combination of evidential reasoning performed by a Bayesian formulation and de Bruijn graph based algorithms; use the statistical probability analysis to identify and call variations detected in the mapped mated reads in relation to the reference polynucleotide sequence; and output a list of the variations, each describing a manner in which the mapped mated reads are observed to differ from the reference polynucleotide sequence at or near a specific location. - View Dependent Claims (31, 32, 33, 34)
-
-
35. A computer-implemented method for calling variations in mapped mated reads obtained from a sample polynucleotide sequence compared to a reference polynucleotide sequence, the method comprising:
-
performing statistical probability analysis on the reference polynucleotide sequence and on the mapped mated reads based in part on a combination of evidential reasoning performed by a Bayesian formulation and de Bruijn graph based algorithms; using the statistical probability analysis to identify and call variations detected in the mapped mated reads in relation to the reference polynucleotide sequence; and outputting a list of the variations, each describing a manner in which the mapped mated reads are observed to differ from the reference polynucleotide sequence at or near a specific location and storing the variation calls in a memory. - View Dependent Claims (36)
-
-
37. An executable software product stored on a computer-readable medium containing program instructions for calling variations in mapped mated reads obtained from a sample polynucleotide sequence compared to a reference polynucleotide sequence, the program instructions for:
-
performing statistical probability analysis on the reference polynucleotide sequence and on the mapped mated reads based in part on a combination of evidential reasoning performed by a Bayesian formulation and de Bruijn graph based algorithms; using the statistical probability analysis to identify and call variations detected in the mapped mated reads in relation to the reference polynucleotide sequence; and outputting a list of the variations, each describing a manner in which the mapped mated reads are observed to differ from the reference polynucleotide sequence at or near a specific location.
-
Specification