METHOD AND SYSTEM FOR APPROXIMATE STRING MATCHING
First Claim
1. A method for approximate string matching of an input pattern to a trie data structure, comprising:
- traversing a trie data structure to find approximate partial and full character string matches of the input pattern, wherein traversing a node of the trie data structure to process a character of the string applies any applicable correction rules to the character, wherein each correction rule has an associated cost, adjusted after each character processed;
accumulating costs as a string of characters is gathered; and
restricting the traverse through the trie data structure according to the accumulated cost of a gathered string and potential costs of applicable correction rules.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system for approximate string matching are provided for generating approximate matches whilst supporting compounding and correction rules. The method for approximate string matching of an input pattern to a trie data structure, includes traversing a trie data structure to find approximate partial and full character string matches of the input pattern. Traversing a node of the trie data structure to process a character of the string applies any applicable correction rules to the character, wherein each correction rule has an associated cost, adjusted after each character processed. The method includes accumulating costs as a string of characters is gathered, and restricting the traverse through the trie data structure according to the accumulated cost of a gathered string and potential costs of applicable correction rules.
28 Citations
30 Claims
-
1. A method for approximate string matching of an input pattern to a trie data structure, comprising:
-
traversing a trie data structure to find approximate partial and full character string matches of the input pattern, wherein traversing a node of the trie data structure to process a character of the string applies any applicable correction rules to the character, wherein each correction rule has an associated cost, adjusted after each character processed; accumulating costs as a string of characters is gathered; and restricting the traverse through the trie data structure according to the accumulated cost of a gathered string and potential costs of applicable correction rules. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A data structure for use in approximate string matching of an input pattern to a trie data structure, comprising:
-
a data structure element for each applicable correction rule for a character of an input pattern, the element being indexed by a position of the character; a matrix of costs indexed by the character position, wherein the matrix of costs is updated during the traverse of the trie data structure to reflect accumulated costs of applied correction rules. - View Dependent Claims (18, 19, 20, 21, 22)
-
-
23. A system for approximate string matching of an input pattern to a trie data structure, comprising:
-
a trie data structure having nodes representing characters in a string, the trie data structure storing allowed character strings; a plurality of character correction rules to be applied to the input pattern including a transition of one or more characters in the input pattern; and a means for generating a correction rule structure for applicable correction rules for an input pattern, the correction rule structure having a plurality of rule elements indexed by a position of the character. - View Dependent Claims (24, 25, 26, 27, 28, 29)
-
-
30. A computer program product stored on a computer readable storage medium, comprising computer readable program code means for performing the steps of:
-
traversing a trie data structure to find approximate partial and full character string matches of the input pattern, wherein traversing a node of the trie data structure to process a character of the string applies any applicable correction rules to the character, wherein each correction rule has an associated cost, adjusted after each character processed; accumulating costs as a string of characters is gathered; and restricting the traverse through the trie data structure according to the accumulated cost of a gathered string and potential costs of applicable correction rules.
-
Specification