METHODS AND SYSTEMS FOR IMPLEMENTING APPROXIMATE STRING MATCHING WITHIN A DATABASE
First Claim
1. A computer-based method for character string matching of a candidate character string with a plurality of character string records stored within a database, said method comprising:
- a) identifying a set of reference character strings in the database, the reference character strings identified utilizing an optimization search for a set of dissimilar character strings;
b) generating an n-gram representation for one of the reference character strings in the set of reference character strings;
c) generating an n-gram representation for the candidate character string;
d) determining a similarity between the n-gram representations;
e) repeating steps b) and d) for the remaining reference character strings in the set of identified reference character strings; and
f) indexing the candidate character string within the database based on the determined similarities between the n-gram representation of the candidate character string and the n-gram representation of the reference character strings in the identified set.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer-based method for character string matching of a candidate character string with a plurality of character string records stored in a database is described. The method includes a) identifying a set of reference character strings in the database, the reference character strings identified utilizing an optimization search for a set of dissimilar character strings, b) generating an n-gram representation for one of the reference character strings in the set of reference character strings, c) generating an n-gram representation for the candidate character string, d) determining a similarity between the n-gram representations, e) repeating steps b) and d) for the remaining reference character strings in the set of identified reference character strings, and f) indexing the candidate character string within the database based on the determined similarities between the n-gram representation of the candidate character string and the reference character strings in the identified set.
-
Citations
20 Claims
-
1. A computer-based method for character string matching of a candidate character string with a plurality of character string records stored within a database, said method comprising:
-
a) identifying a set of reference character strings in the database, the reference character strings identified utilizing an optimization search for a set of dissimilar character strings; b) generating an n-gram representation for one of the reference character strings in the set of reference character strings; c) generating an n-gram representation for the candidate character string; d) determining a similarity between the n-gram representations; e) repeating steps b) and d) for the remaining reference character strings in the set of identified reference character strings; and f) indexing the candidate character string within the database based on the determined similarities between the n-gram representation of the candidate character string and the n-gram representation of the reference character strings in the identified set. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer programmed to:
-
utilize an optimization search to identify a set of dissimilar reference character strings in a database; generate an n-gram representation for a candidate character string; generate an n-gram representation for each of the dissimilar reference character strings in the set; determine a similarity between the n-gram representation of the candidate character string and each n-gram representation of the set of dissimilar reference character strings; and index the candidate character string within the database based on the similarities determined in the n-gram representations. - View Dependent Claims (12, 13, 14, 15)
-
-
16. A computer-based method for approximate matching of a candidate character string to a set of reference character strings within a database, said method comprising:
-
individually comparing an n-gram representation of the candidate character string to n-gram representations for each reference character string in the set of reference character strings; and generating a binary index value that is associated with the candidate character string, the binary index value indicating a similarity between the candidate character string and each of the reference character strings. - View Dependent Claims (17, 18, 19, 20)
-
Specification