Method and apparatus for performing similarity searching
First Claim
1. A system for generating a hash table for use in comparing a first biosequence string with a second biosequence string to assess similarity between the first and second biosequence strings, the system comprising:
- a processor configured to provide hashing on a plurality of substrings of the first biosequence string to (1) map each substring of the first biosequence string to a location in a hash table, and (2) generate the hash table, the hash table being configured to store an entry at each mapped location that is populated with a pointer to a position in the first biosequence string for the substring of the first biosequence string mapped to that location;
a memory for storing the hash table; and
a field programmable gate array (FPGA) configured to (1) detect substrings of the second biosequence string that are possible matches to substrings of the first biosequence string, and (2) link the detected substrings of the second biosequence string to corresponding positions in the first biosequence string where the detected substrings are located by applying hashing logic to the detected substrings as against the hash table to retrieve the pointers from the hash table entries to which the hashing logic maps the detected substrings.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method for performing similarity searching is disclosed wherein programmable logic devices such as field programmable gate arrays (FPGAs) can be used to implement Bloom filters for identifying possible matches between a query and data. The Bloom filters can be implemented in a parallel architecture where the different parallel Bloom filters share access to the same memory units. Further, a hash table may be generated to map a set of strings to keys. In other examples, the hash table may be used to map a set of substrings to a position in a larger string.
385 Citations
34 Claims
-
1. A system for generating a hash table for use in comparing a first biosequence string with a second biosequence string to assess similarity between the first and second biosequence strings, the system comprising:
-
a processor configured to provide hashing on a plurality of substrings of the first biosequence string to (1) map each substring of the first biosequence string to a location in a hash table, and (2) generate the hash table, the hash table being configured to store an entry at each mapped location that is populated with a pointer to a position in the first biosequence string for the substring of the first biosequence string mapped to that location; a memory for storing the hash table; and a field programmable gate array (FPGA) configured to (1) detect substrings of the second biosequence string that are possible matches to substrings of the first biosequence string, and (2) link the detected substrings of the second biosequence string to corresponding positions in the first biosequence string where the detected substrings are located by applying hashing logic to the detected substrings as against the hash table to retrieve the pointers from the hash table entries to which the hashing logic maps the detected substrings. - View Dependent Claims (2, 3, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
4. A method for generating a hash table for use in comparing a first biosequence string with a second biosequence string to assess similarity between the first and second biosequence strings, the method comprising:
-
hashing a plurality of substrings of the first biosequence string with a processor to map each substring of the first biosequence string to a location a hash table; generating the hash table, the hash table being storing an entry at each mapped location that is populated with a pointer to a position in the first biosequence string for the substring of the first biosequence string mapped to that location; storing the hash table within a memory; a field programmable gate array (FPGA) detecting substrings of the second biosequence string that are possible matches to substrings of the first biosequence string; and the FPGA linking the detected substrings of the second biosequence string to corresponding positions in the first biosequence string where the detected substrings are located by applying hashing logic to the detected substrings as against the hash table to retrieve the pointers from the hash table entries to which the hashing logic maps the detected substrings. - View Dependent Claims (5, 6, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
-
Specification