BamBam: parallel comparative analysis of high-throughput sequencing data

US 9,721,062 B2
Filed: 05/27/2016
Issued: 08/01/2017
Est. Priority Date: 05/25/2010
Status: Active Grant

First Claim

Patent Images

1. A computer-based sequence analysis system comprising:

a computer readable memory configured to store at least a first and a second genomic sequence datasets, the sequence datasets comprising genomic reads associated with respective first and second tissues; and

a sequence analysis engine having a processor coupled with the computer readable memory and configured to;

determine a common genomic location in the first and second genomic sequence datasets;

generate at least a pair of pileups by;

reading a first set of pileups that includes genomic reads from the first genomic sequence dataset and that overlap the common genomic location; and

reading a second set of pileups that includes genomic reads from the second genomic sequence dataset and that also overlap the common genomic location;

infer at least a pair of genotypes for the common genomic location based on the at least the pair of pileups, the at least the pair of genotypes including a first genotype associated with the first tissue and a second genotype associated with the second tissue;

identify a genomic difference between the first genotype and the second genotype in the at least the pair of genotypes;

filter false positives based on a skewing from a random distribution; and

store the genomic difference in a device memory.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention relates to methods for evaluating and/or predicting the outcome of a clinical condition, such as cancer, metastasis, AIDS, autism, Alzheimer'"'"'s, and/or Parkinson'"'"'s disorder. The methods can also be used to monitor and track changes in a patient'"'"'s DNA and/or RNA during and following a clinical treatment regime. The methods may also be used to evaluate protein and/or metabolite levels that correlate with such clinical conditions. The methods are also of use to ascertain the probability outcome for a patient'"'"'s particular prognosis.

38 Citations

View as Search Results

18 Claims

1. A computer-based sequence analysis system comprising:
- a computer readable memory configured to store at least a first and a second genomic sequence datasets, the sequence datasets comprising genomic reads associated with respective first and second tissues; and
  
  a sequence analysis engine having a processor coupled with the computer readable memory and configured to;
  
  determine a common genomic location in the first and second genomic sequence datasets;
  
  generate at least a pair of pileups by;
  
  reading a first set of pileups that includes genomic reads from the first genomic sequence dataset and that overlap the common genomic location; and
  
  reading a second set of pileups that includes genomic reads from the second genomic sequence dataset and that also overlap the common genomic location;
  
  infer at least a pair of genotypes for the common genomic location based on the at least the pair of pileups, the at least the pair of genotypes including a first genotype associated with the first tissue and a second genotype associated with the second tissue;
  
  identify a genomic difference between the first genotype and the second genotype in the at least the pair of genotypes;
  
  filter false positives based on a skewing from a random distribution; and
  
  store the genomic difference in a device memory.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 2. The system of claim 1, wherein the sequence analysis engine is further configured to infer the at least the pair of genotypes based on a joint probability derived based on the at least the pair of pileups.
  - 3. The system of claim 2, wherein the sequence analysis engine is further configured to select the first and second genotype based on maximizing the joint probability.
  - 4. The system of claim 1, wherein the sequence analysis engine is further configured infer the at least the pair of genotypes based on reads in the pair of pileups exceeding mapping quality thresholds.
  - 5. The system of claim 1, wherein the sequence analysis engine is further configured infer the at least the pair of genotypes based on reads in the pair of pileups exceeding a user-defined base.
  - 6. The system of claim 1, wherein the genomic difference is selected from the group consisting of:
    - a somatic mutation, a copy number alteration, an allele-specific copy number, a sequence variant, and a sequence loss of heterozygosity.
  - 7. The system of claim 1, wherein the first tissue and the second tissue are from the same patient.
  - 8. The system of claim 1, wherein the first tissue comprises a tumor tissue and the second tissue comprises a matched normal tissue.
  - 9. The system of claim 1, wherein at least one of the first and the second genomic sequence datasets comprises data associated with at least one of the following:
    - DNA, RNA, mRNA, tRNA, rRNA, miRNA, and asRNA.
  - 10. The system of claim 1, wherein the sequence analysis engine is further configured to keep the first and the second sequence datasets synchronized with respect to a genome.
  - 11. The system of claim 1, wherein the sequence analysis engine is further configured to read the at least the pair of pileups at the same time.
  - 12. The system of claim 1, wherein the at least one of the first set of pileups and the second set of pileups include short reads.
  - 13. The system of claim 1, wherein the at least a pair of pileups includes at least three sets of pileups that include a third set of pileups representing a third genome.
  - 14. The system of claim 13, wherein the third set of pileups represent a relapsed sequence.
  - 15. The system of claim 1, wherein the at least one of the first and the second genomic sequence datasets comprises at least one of a BAM file and a SAM file.
  - 16. The system of claim 1, wherein the common genomic location is relative to a reference genome.
  - 17. The system of claim 1, wherein the sequence analysis engine is further configured to determine the common genomic location by incrementally moving to a next position in the reference genome.
  - 18. The system of claim 17, where in the common genomic location comprises a next common genomic location within the reference genome.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Regents of the University of California (University of California)
Original Assignee
Regents of the University of California (University of California)
Inventors
Sanborn, John Zachary, Haussler, David
Primary Examiner(s)
Smith, Paulinho E

Application Number

US15/167,530
Publication Number

US 20160275257A1
Time in Patent Office

431 Days
Field of Search

None
US Class Current
CPC Class Codes

C12Q 1/6886   for cancer immunoassay for ...

C12Q 2600/106   Pharmacogenomics, i.e. gene...

C12Q 2600/118   Prognosis of disease develo...

C12Q 2600/156   Polymorphic or mutational m...

G06F 2203/04806   Zoom, i.e. interaction tech...

G06F 3/04845   for image manipulation, e.g...

G06F 40/169   Annotation, e.g. comment da...

G06N 7/01   Probabilistic graphical mod...

G06T 11/206   Drawing of charts or graphs

G16B 20/20   Allele or variant detection...

G16B 30/00   ICT specially adapted for s...

G16B 30/10   Sequence alignment; Homolog...

G16B 40/00   ICT specially adapted for b...

G16H 10/40   for data related to laborat...

G16H 10/60   for patient-specific data, ...

G16H 50/20   for computer-aided diagnosi...

G16H 50/30   for calculating health indi...

G16H 70/20   relating to practices or gu...

Y02A 90/10   Information and communicati...

Y02A 90/30   Assessment of water resources

BamBam: parallel comparative analysis of high-throughput sequencing data

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

38 Citations

18 Claims

Specification

Use Cases

Quick Links

Others

BamBam: parallel comparative analysis of high-throughput sequencing data

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

38 Citations

18 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others