Method and apparatus for performing biosequence similarity searching
First Claim
1. A digital logic circuit for performing biological sequence similarity searching, the circuit comprising:
- a programmable logic device configured to include a pipeline comprising a Bloom filter stage, the Bloom filter stage being configured to receive a data stream comprising a plurality of biological sequence data strings and filter the biological sequence data stream to identify a plurality of possible matches between the sequence data strings and a plurality of substrings of a query string.
3 Assignments
0 Petitions
Accused Products
Abstract
A system and method for performing biological sequence similarity searching is disclosed. This includes a programmable logic device configured to include a pipeline that comprises a matching stage, the matching stage being configured to receive a data stream comprising a plurality of possible matches between a plurality of biological sequence data strings and a plurality of substrings of a query string. The pipeline may further include a ungapped extension prefilter stage located downstream from the matching stage, the prefilter stage being configured to shift through pattern matches between the biological sequence data strings and the plurality of substrings of a query string and provide a score so that only pattern matches that exceed a user defined score will pass downstream from the prefilter stage. The matching stage may include at least one Bloom filter.
-
Citations
114 Claims
-
1. A digital logic circuit for performing biological sequence similarity searching, the circuit comprising:
a programmable logic device configured to include a pipeline comprising a Bloom filter stage, the Bloom filter stage being configured to receive a data stream comprising a plurality of biological sequence data strings and filter the biological sequence data stream to identify a plurality of possible matches between the sequence data strings and a plurality of substrings of a query string. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
25. A system for performing seeded alignment searching, the system comprising:
a programmable logic device configured to implement a seeded alignment searching pipeline, the pipeline comprising a Bloom filter stage, the Bloom filter stage being configured to receive a data stream, the data stream comprising a plurality of data strings and filter the data stream to identify a plurality of possible matches between the stream data strings and a plurality of substrings of a query string. - View Dependent Claims (26)
-
27. A method for performing biological sequence similarity searching, the method comprising:
processing a biological sequence data stream through a Bloom filter stage implemented on a programmable logic device, the biological sequence data stream comprising a plurality of biological sequence data strings, the Bloom filter stage comprising a Bloom filter that is programmed with a plurality of substrings of a query string and configured to identify a plurality of possible matches between the biological sequence data strings and the query substrings. - View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46)
-
47. A system for generating a hash table for use in mapping a set of strings to keys, the system comprising:
-
a processor configured to provide near perfect hashing on the set of strings to identify a location in a hash table, utilizing an H3 family of hash functions, corresponding to each string and to generate the hash table, the hash table being configured to store, at each identified location therein is a key; and
a memory in communication with the processor for storing the hash table, wherein the processor is configured to provide near perfect hashing via hash functions, utilizing the H3 family of hash functions, that are guaranteed to be of full rank.
-
-
48. A system for generating a hash table for use in mapping a set of substrings to a position in a larger string, the system comprising:
-
a processor configured to provide near perfect hashing on the set of substrings to identify a position in a hash table, corresponding to each substring and to generate the hash table, the hash table being configured to store, at each identified location therein, a pointer to a position of each substring in the larger string; and
a memory in communication with the processor for storing the hash table.
-
-
49. A digital logic circuit for performing biological sequence similarity searching, the circuit comprising:
a programmable logic device configured to include a pipeline that comprises a matching stage, the matching stage being configured to receive a data stream comprising a plurality of possible matches between a plurality of biological sequence data strings and a plurality of substrings of a query string and the pipeline further comprises a prefilter stage located downstream from the matching stage, the ungapped extension prefilter stage being configured to shift through pattern matches between the biological sequence data strings and the plurality of substrings of a query string and provide a score so that only pattern matches that exceed a user defined score will pass downstream from the ungapped extension prefilter stage. - View Dependent Claims (50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81)
-
82. A method for performing biological sequence similarity searching, the method comprising:
-
processing a biological sequence data stream with a programmable logic device configured to include a pipeline that comprises a matching stage, the matching stage being configured to receive a data stream comprising a plurality of possible matches between a plurality of biological sequence data strings and a plurality of substrings of a query string; and
utilizing a ungapped extension prefilter stage located downstream from the matching stage in the pipeline, the ungapped extension prefilter stage being configured to shift through pattern matches between the biological sequence data strings and the plurality of substrings of a query string and provide a score so that only pattern matches that exceed a user defined score will pass downstream from the ungapped extension prefilter stage. - View Dependent Claims (83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114)
-
Specification