Determining a known character string equivalent to a query string
First Claim
Patent Images
1. A computer-implemented method comprising:
- modifying a query string of characters using a predetermined set of heuristics;
performing a character-by-character comparison of the modified query string with at least one known string of characters in a corpus in order to locate an exact match for the modified query string; and
responsive to not finding an exact match, performing the following steps in order to locate an equivalent for the modified query string;
forming a plurality of sub-string of characters from the query string, the sub-strings having varying lengths such that at least two of the formed sub-strings differ in length; and
using an information retrieval technique on the sub-strings formed from the query string to identify a known string of characters equivalent to the query string,wherein the information retrieval technique further comprises;
weighting the sub-strings;
scoring known strings of characters; and
retrieving information associated with the known string having the highest score.
12 Assignments
0 Petitions
Accused Products
Abstract
A system, method, and computer program product perform text equivalencing. The text equivalencing is performed by modifying a string of characters by applying a set of heuristics, comparing the modified strings of characters to known strings of characters. If a match is found, the text equivalencing engine performs database update and exits. If no match is found, sub-strings are formed by grouping together frequently occurring sets of characters. An information retrieval technique is performed on the sub-strings to determine equivalent text.
231 Citations
12 Claims
-
1. A computer-implemented method comprising:
-
modifying a query string of characters using a predetermined set of heuristics; performing a character-by-character comparison of the modified query string with at least one known string of characters in a corpus in order to locate an exact match for the modified query string; and responsive to not finding an exact match, performing the following steps in order to locate an equivalent for the modified query string; forming a plurality of sub-string of characters from the query string, the sub-strings having varying lengths such that at least two of the formed sub-strings differ in length; and using an information retrieval technique on the sub-strings formed from the query string to identify a known string of characters equivalent to the query string, wherein the information retrieval technique further comprises; weighting the sub-strings; scoring known strings of characters; and retrieving information associated with the known string having the highest score. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
Specification