BamBam: parallel comparative analysis of high-throughput sequencing data
First Claim
Patent Images
1. A computer-based sequence analysis system comprising:
- a computer readable memory configured to store at least a first and a second genomic sequence datasets, the sequence datasets comprising genomic reads associated with respective first and second tissues; and
a sequence analysis engine having a processor coupled with the computer readable memory and configured to;
determine a common genomic location in the first and second genomic sequence datasets;
generate at least a pair of pileups by;
reading a first set of pileups that includes genomic reads from the first genomic sequence dataset and that overlap the common genomic location; and
reading a second set of pileups that includes genomic reads from the second genomic sequence dataset and that also overlap the common genomic location;
infer at least a pair of genotypes for the common genomic location based on the at least the pair of pileups, the at least the pair of genotypes including a first genotype associated with the first tissue and a second genotype associated with the second tissue;
identify a genomic difference between the first genotype and the second genotype in the at least the pair of genotypes;
filter false positives based on a skewing from a random distribution; and
store the genomic difference in a device memory.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention relates to methods for evaluating and/or predicting the outcome of a clinical condition, such as cancer, metastasis, AIDS, autism, Alzheimer'"'"'s, and/or Parkinson'"'"'s disorder. The methods can also be used to monitor and track changes in a patient'"'"'s DNA and/or RNA during and following a clinical treatment regime. The methods may also be used to evaluate protein and/or metabolite levels that correlate with such clinical conditions. The methods are also of use to ascertain the probability outcome for a patient'"'"'s particular prognosis.
38 Citations
18 Claims
-
1. A computer-based sequence analysis system comprising:
-
a computer readable memory configured to store at least a first and a second genomic sequence datasets, the sequence datasets comprising genomic reads associated with respective first and second tissues; and a sequence analysis engine having a processor coupled with the computer readable memory and configured to; determine a common genomic location in the first and second genomic sequence datasets; generate at least a pair of pileups by; reading a first set of pileups that includes genomic reads from the first genomic sequence dataset and that overlap the common genomic location; and reading a second set of pileups that includes genomic reads from the second genomic sequence dataset and that also overlap the common genomic location; infer at least a pair of genotypes for the common genomic location based on the at least the pair of pileups, the at least the pair of genotypes including a first genotype associated with the first tissue and a second genotype associated with the second tissue; identify a genomic difference between the first genotype and the second genotype in the at least the pair of genotypes; filter false positives based on a skewing from a random distribution; and store the genomic difference in a device memory. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
Specification