BAMBAM: parallel comparative analysis of high-throughput sequencing data
First Claim
Patent Images
1. A genomic analysis system comprising:
- a data storage device storing a first aligned read file and a second aligned read file, where each file comprises genetic aligned sequence reads that include genomic location information; and
an analysis engine coupled with the data storage device and configured to;
obtain a first aligned sequence read from the first aligned read file and a second aligned sequence read from the second aligned read file;
align the first aligned sequence read and the second aligned sequence read through their respective genomic location information to form a local alignment between the first and second aligned sequence reads by using at least one known position of a sub-string of the first aligned sequence read or the second aligned sequence read in incremental synchronizing of the first aligned sequence read and the second aligned sequence read that moves from a previous local alignment to a next local alignment to bring the first aligned sequence read and the second aligned sequence read into alignment with each other and with each next local alignment depending upon the previous local alignment except for one or more mismatched portions of the first aligned sequence read or the second aligned sequence read;
generate a local differential sequence between the first and the second aligned sequence reads within the local alignment;
update a differential genetic sequence object according to the local differential sequence, wherein the updated differential genetic sequence object comprises the one or more mismatched portions of the first aligned sequence read or the second aligned sequence read; and
cause the differential genetic sequence object to be stored on a second storage device.
1 Assignment
0 Petitions
Accused Products
Abstract
A differential sequence object is constructed on the basis of alignment of sub-strings via incremental synchronization of sequence strings using known positions of the sub-strings relative to a reference genome sequence. An output file is then generated that comprises only relevant changes with respect to the reference genome.
55 Citations
18 Claims
-
1. A genomic analysis system comprising:
-
a data storage device storing a first aligned read file and a second aligned read file, where each file comprises genetic aligned sequence reads that include genomic location information; and an analysis engine coupled with the data storage device and configured to; obtain a first aligned sequence read from the first aligned read file and a second aligned sequence read from the second aligned read file;
align the first aligned sequence read and the second aligned sequence read through their respective genomic location information to form a local alignment between the first and second aligned sequence reads by using at least one known position of a sub-string of the first aligned sequence read or the second aligned sequence read in incremental synchronizing of the first aligned sequence read and the second aligned sequence read that moves from a previous local alignment to a next local alignment to bring the first aligned sequence read and the second aligned sequence read into alignment with each other and with each next local alignment depending upon the previous local alignment except for one or more mismatched portions of the first aligned sequence read or the second aligned sequence read;generate a local differential sequence between the first and the second aligned sequence reads within the local alignment; update a differential genetic sequence object according to the local differential sequence, wherein the updated differential genetic sequence object comprises the one or more mismatched portions of the first aligned sequence read or the second aligned sequence read; and cause the differential genetic sequence object to be stored on a second storage device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
Specification