Bambam: parallel comparative analysis of high-throughput sequencing data
First Claim
Patent Images
1. A parallel genomic comparative analysis system comprising:
- a memory; and
an sequence analysis engine coupled with the memory and configured to;
identify a genomic position within a reference genome;
access a first file storing tumor sequence data including short reads associated with a tumor tissue;
access a second file storing match normal sequence data short reads associated with a matched normal tissue;
store in the memory a tumor dataset having tumor short read sequences from the first file where the tumor short read sequences overlap the genomic position;
store in the memory a matched normal dataset having matched normal short read sequences from the second file and that overlap the genomic position;
select a tumor genotype and a matched normal genotype that maximize a joint probability as a function of the tumor short read sequences and the match normal short read sequences at the genomic position, wherein the joint probability depends on one of a probability calculated as a multinomial operating as a function of the matched normal genotype or as a probability calculated as a multinomial operating as a function of the tumor genotype; and
store a difference between the tumor genotype and the matched normal genotype in a device memory.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention relates to methods for evaluating and/or predicting the outcome of a clinical condition, such as cancer, metastasis, AIDS, autism, Alzheimer'"'"'s, and/or Parkinson'"'"'s disorder. The methods can also be used to monitor and track changes in a patient'"'"'s DNA and/or RNA during and following a clinical treatment regime. The methods may also be used to evaluate protein and/or metabolite levels that correlate with such clinical conditions. The methods are also of use to ascertain the probability outcome for a patient'"'"'s particular prognosis.
-
Citations
19 Claims
-
1. A parallel genomic comparative analysis system comprising:
-
a memory; and an sequence analysis engine coupled with the memory and configured to; identify a genomic position within a reference genome; access a first file storing tumor sequence data including short reads associated with a tumor tissue; access a second file storing match normal sequence data short reads associated with a matched normal tissue; store in the memory a tumor dataset having tumor short read sequences from the first file where the tumor short read sequences overlap the genomic position; store in the memory a matched normal dataset having matched normal short read sequences from the second file and that overlap the genomic position; select a tumor genotype and a matched normal genotype that maximize a joint probability as a function of the tumor short read sequences and the match normal short read sequences at the genomic position, wherein the joint probability depends on one of a probability calculated as a multinomial operating as a function of the matched normal genotype or as a probability calculated as a multinomial operating as a function of the tumor genotype; and store a difference between the tumor genotype and the matched normal genotype in a device memory. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A parallel genomic comparative analysis
system comprising: - a memory; and
a sequence analysis engine coupled with the memory and configured to; access a first file storing tumor sequence data including short reads associated with a tumor tissue; access a second file storing matched normal sequence data short reads associated with a matched normal tissue; align, relative to a first genomic position within a reference genome, the short reads associated with the tumor tissue with the short reads associated with the matched normal tissue; process at the same time all aligned short reads to determine a difference between the tumor sequence data and the matched normal sequence data; store a difference between the tumor sequence data and the matched normal sequence data in a device memory; align, relative to a second genomic position within the reference genome, the short reads associated with the tumor tissue with the short reads associated with the matched normal tissue; process at the same time all aligned short reads to determine a second difference between the tumor sequence data and the matched normal sequence data; and store the second difference between the tumor sequence data and the matched normal sequence data in the device memory. - View Dependent Claims (15, 16, 17)
- a memory; and
-
18. A parallel genomic comparative analysis system comprising:
-
a memory; and an sequence analysis engine coupled with the memory and configured to; identify a genomic position within a reference genome; access a first file storing tumor sequence data including short reads associated with a tumor tissue; access a second file storing match normal sequence data short reads associated with a matched normal tissue; store in the memory a tumor dataset having tumor short read sequences from the first file where the tumor short read sequences overlap the genomic position, wherein the tumor dataset comprises all tumor short read sequences in the first file that overlap the genomic position; store in the memory a matched normal dataset having matched normal short read sequences from the second file and that overlap the genomic position; select a tumor genotype and a matched normal genotype that maximize a joint probability as a function of the tumor short read sequences and the match normal short read sequences at the genomic position; and store a difference between the tumor genotype and the matched normal genotype in a device memory. - View Dependent Claims (19)
-
Specification