Method and apparatus for performing similarity searching on a data stream with respect to a query string
First Claim
1. A method for performing similarity searching on a data stream with respect to a query string, the data stream comprising a plurality of data substrings, the query string comprising a plurality of query substrings, the method comprising:
- filtering the data stream using a programmable logic device configured to find a plurality of possible matches between the data substrings and a plurality of the query substrings,wherein the data substrings and the query substrings comprise a plurality of characters;
for each data substring that was found to be a possible match as a result of the filtering step, identifying at least one corresponding query substring for which that data substring is a possible match; and
determining a similarity between the query string and at least a portion of the data stream based on the possible matches found by the filtering step, wherein the determining step comprises (1) comparing the characters of a window of the data stream that encompasses the data substring found to be a possible match with the characters of a window of the query string that encompasses the identified query substring corresponding to the data substring that was found to be a possible match, and (2) assessing whether the data stream portion and the query string qualify as being similar to each other based on the comparing step such that a controlled number of mismatches are permitted between the characters of the data stream window and the characters the query string window.
3 Assignments
0 Petitions
Accused Products
Abstract
An apparatus and method for performing similarity searching on a data stream with respect to a query string are disclosed, where the data stream comprises a plurality of data substrings, and where the query string comprises a plurality of query substrings. A programmable logic device is used to filter the data stream to find a plurality of possible matches between the data substrings and a plurality of the query substrings, wherein the data substrings and the query substrings comprise a plurality of characters. From these possible matches, a determination can be made as to a similarity between the query string and at least a portion of the data stream.
-
Citations
26 Claims
-
1. A method for performing similarity searching on a data stream with respect to a query string, the data stream comprising a plurality of data substrings, the query string comprising a plurality of query substrings, the method comprising:
-
filtering the data stream using a programmable logic device configured to find a plurality of possible matches between the data substrings and a plurality of the query substrings, wherein the data substrings and the query substrings comprise a plurality of characters; for each data substring that was found to be a possible match as a result of the filtering step, identifying at least one corresponding query substring for which that data substring is a possible match; and determining a similarity between the query string and at least a portion of the data stream based on the possible matches found by the filtering step, wherein the determining step comprises (1) comparing the characters of a window of the data stream that encompasses the data substring found to be a possible match with the characters of a window of the query string that encompasses the identified query substring corresponding to the data substring that was found to be a possible match, and (2) assessing whether the data stream portion and the query string qualify as being similar to each other based on the comparing step such that a controlled number of mismatches are permitted between the characters of the data stream window and the characters the query string window. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. An apparatus for performing similarity searching on a data stream with respect to a query string, the data stream comprising a plurality of data substrings, the query string comprising a plurality of query substrings, the apparatus comprising:
a programmable logic device configured to;
(1) filter the data stream to find a plurality of possible matches between the data substrings and a plurality of the query substrings, wherein the data substrings and the query substrings comprise a plurality of characters, (2) for each data substring that was found to be a possible match as a result of the filtering operation, identify at least one corresponding query substring for which that data substring is a possible match, and (3) determine a similarity between the query string and at least a portion of the data stream based on the filtered possible matches by (i) comparing the characters of a window of the data stream that encompasses the data substring found to be a possible match with the characters of a window of the query string that encompasses the identified query substring corresponding to the data substring that was found to be a possible match, and (ii) assessing whether the data stream portion and the query string qualify as being similar to each other based on the comparison such that a controlled number of mismatches are permitted between the characters of the data stream window and the characters the query string window.- View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
Specification