Bambam: parallel comparative analysis of high-throughput sequencing data
First Claim
Patent Images
1. A method of deriving a differential genetic sequence object, the method comprising:
- accessing a genetic database storing a first set of genetic sequence strings and associated reads representing a first tissue and a second set of genetic sequence strings and associated reads representing a second tissue, wherein the first set and the second set include genomic location information, wherein the accessing is executed by a hardware processor;
aligning the first set of genetic sequence strings and the second set of genetic sequence strings using the genomic location information in at least one of the first set or the second set, the first set of genetic sequence strings and the second set of genetic sequence strings being analyzed against each other, wherein the aligning is executed by the hardware processor the analyzing comprising;
determining base probabilities of possible locations of sequence reads in the first and second genetic sequence strings as a function of error rates of at least one sequencer;
identifying a difference between the first set and the second set of genetic sequence strings by comparing genotypes from the first and the second sets that, overlapping at a particular genomic position, maximize a likelihood probability function identifying the genotypes as being different and that are located at the particular genomic position, where the likelihood probability function operates as a probability distribution of a likelihood that unmapped sequence reads of both the first set, representing the first tissue, and the second set, representing the second tissue, align to possible junction sequences, modeled over the base probabilities and associated sequence reads;
generating a local differential string that represents a difference between synchronized sub-strings of corresponding first and second sets of sequence strings within local alignment, based on the identifying the difference between the first set and the second set of genetic sequence strings by comparing the genotypes;
updating a differential genetic sequence object in a differential sequence database with information according to the local differential string; and
generating a patient specific clinical instruction based on information of the differential genetic sequence object.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention relates to methods for evaluating and/or predicting the outcome of a clinical condition, such as cancer, metastasis, AIDS, autism, Alzheimer'"'"'s, and/or Parkinson'"'"'s disorder. The methods can also be used to monitor and track changes in a patient'"'"'s DNA and/or RNA during and following a clinical treatment regime. The methods may also be used to evaluate protein and/or metabolite levels that correlate with such clinical conditions. The methods are also of use to ascertain the probability outcome for a patient'"'"'s particular prognosis.
41 Citations
31 Claims
-
1. A method of deriving a differential genetic sequence object, the method comprising:
-
accessing a genetic database storing a first set of genetic sequence strings and associated reads representing a first tissue and a second set of genetic sequence strings and associated reads representing a second tissue, wherein the first set and the second set include genomic location information, wherein the accessing is executed by a hardware processor; aligning the first set of genetic sequence strings and the second set of genetic sequence strings using the genomic location information in at least one of the first set or the second set, the first set of genetic sequence strings and the second set of genetic sequence strings being analyzed against each other, wherein the aligning is executed by the hardware processor the analyzing comprising; determining base probabilities of possible locations of sequence reads in the first and second genetic sequence strings as a function of error rates of at least one sequencer; identifying a difference between the first set and the second set of genetic sequence strings by comparing genotypes from the first and the second sets that, overlapping at a particular genomic position, maximize a likelihood probability function identifying the genotypes as being different and that are located at the particular genomic position, where the likelihood probability function operates as a probability distribution of a likelihood that unmapped sequence reads of both the first set, representing the first tissue, and the second set, representing the second tissue, align to possible junction sequences, modeled over the base probabilities and associated sequence reads; generating a local differential string that represents a difference between synchronized sub-strings of corresponding first and second sets of sequence strings within local alignment, based on the identifying the difference between the first set and the second set of genetic sequence strings by comparing the genotypes; updating a differential genetic sequence object in a differential sequence database with information according to the local differential string; and generating a patient specific clinical instruction based on information of the differential genetic sequence object. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
-
Specification