Methods and apparatus for high-speed approximate sub-string searches
First Claim
Patent Images
1. A method for comparing a plurality of query sequences against a sequence database comprising the steps of:
- (a) combining said plurality of query sequences into a combined query sequence;
(b) determining a plurality of subdivisions of said database;
(c) performing a plurality of searches, wherein each search comprises a comparison of said combined query sequence against one of said plurality of subdivisions of said database, producing a plurality of word matches;
(d) extending the length of plurality of word matches produced in step (c), producing a plurality of High-scoring Segment Pairs;
(e) combining said plurality of High-scoring Segment Pairs; and
(f) producing a plurality of reports, each report representing the highest scoring matches for one of said plurality of query sequences.
3 Assignments
0 Petitions
Accused Products
Abstract
The present invention provides methods and systems for performing improved implementations of the BLAST algorithm for high-speed sequence database searching, e.g., on commodity Beowulf-class parallel computing hardware. The present invention also provides methods and systems for query packing, dynamic database division, and improved hit extension.
116 Citations
9 Claims
-
1. A method for comparing a plurality of query sequences against a sequence database comprising the steps of:
-
(a) combining said plurality of query sequences into a combined query sequence;
(b) determining a plurality of subdivisions of said database;
(c) performing a plurality of searches, wherein each search comprises a comparison of said combined query sequence against one of said plurality of subdivisions of said database, producing a plurality of word matches;
(d) extending the length of plurality of word matches produced in step (c), producing a plurality of High-scoring Segment Pairs;
(e) combining said plurality of High-scoring Segment Pairs; and
(f) producing a plurality of reports, each report representing the highest scoring matches for one of said plurality of query sequences. - View Dependent Claims (2, 3, 9)
-
-
4. A method for conducting sequence searches in a database with a BLAST algorithm using the following steps:
- (a) compile a list of high scoring words of length w from a query sequence;
(b) for each sequence in the database, scan for word hits with scores greater than a threshold T; and
(c) for each word hit, extend it in both directions to form a High Scoring Pair (“
HSP”
) of score greater than or equal to S, wherein the improvement comprises combining a plurality of query sequences into a combined query sequence before conducting said search.
- (a) compile a list of high scoring words of length w from a query sequence;
-
5. A method for conducting sequence searches in a database with the BLAST algorithm using the following steps:
- (a) compile a list of high scoring words of length w from the query sequence;
(b) for each sequence in the database, scan for word hits with scores greater than a threshold T; and
(c) for each word hit, extend it in both directions to form a High Scoring Pair (“
HSP”
) of score greater than or equal to S, wherein the improvement comprises determining a pluarlity of subdivisions of said database before conducting said search.
- (a) compile a list of high scoring words of length w from the query sequence;
-
6. A method for conducting sequence searches in a database with the BLAST algorithm using the following steps:
- (a) compile a list of high scoring words of length w from the query sequence;
(b) for each sequence in the database, scan for word hits with scores greater than a threshold T; and
(c) for each word hit, extend it in both directions to form a High Scoring Pair (“
HSP”
) of score greater than or equal to S, wherein the improvement comprises restructuring coding so that whenever possible, a hash table stays in the processor'"'"'s cache, rather than bouncing back and forth between processor and memory.
- (a) compile a list of high scoring words of length w from the query sequence;
-
7. A method for conducting sequence searches in a database with the BLAST algorithm using the following steps:
- (a) compile a list of high scoring words of length w from the query sequence;
(b) for each sequence in the database, scan for word hits with scores greater than a threshold T; and
(c) for each word hit, extend it in both directions to form a High Scoring Pair (“
HSP”
) of score greater than or equal to S, wherein the improvement comprises dividing query sequence that equals to or is longer than a megabase into smaller pieces before conducting said search. - View Dependent Claims (8)
- (a) compile a list of high scoring words of length w from the query sequence;
Specification