Method and system for approximate string matching
First Claim
1. A method of approximate string matching of a target string to a trie data structure, the trie data structure having a root node and generations of child nodes each node representing at least one character in an alphabet, the method comprising:
- traversing a trie data structure starting from the root node by comparing each node of a branch of the trie data structure to characters in the target string;
adding characters traversed in a branch of the trie data structure to a gathered string;
reaching a node flagged as a node for a word or a word fragment and, if the target string is longer than the gathered string, looping back to the root node, and continuing the traverse from the root node.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system are provided for approximate string matching of a target string to a trie data structure. The trie data structure has a root node and generations of child nodes each node representing at least one character in an alphabet to provide a lexicon of words and word fragments. The method involves traversing the trie data structure starting from the root node by comparing each node of a branch of the trie data structure to characters in the target string and adding characters traversed in a branch of the trie data structure to a gathered string to provide suggestions of approximate matches. If the method reaches a node flagged as a node for a word or a word fragment and, if the target string is longer than the gathered string, the method loops back to the root node, and continues the traverse from the root node. This enables the trie data structure to use word fragments for compound words and to split non-delimited words where appropriate. The method also includes, at each node, determining if there is a correction rule for one or more characters in the remainder of the target string from the current node, and if so, applying the correction rule to the target string to obtain a modified target string.
-
Citations
34 Claims
-
1. A method of approximate string matching of a target string to a trie data structure, the trie data structure having a root node and generations of child nodes each node representing at least one character in an alphabet, the method comprising:
-
traversing a trie data structure starting from the root node by comparing each node of a branch of the trie data structure to characters in the target string;
adding characters traversed in a branch of the trie data structure to a gathered string;
reaching a node flagged as a node for a word or a word fragment and, if the target string is longer than the gathered string, looping back to the root node, and continuing the traverse from the root node. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for approximate string matching of a target string to a trie data structure, the system comprising:
-
a trie data structure having a root node and generations of child nodes each node representing at least one character in an alphabet;
means for traversing a trie data structure starting from the root node by comparing each node of a branch of the trie data structure to characters in the target string and adding characters traversed in a branch of the trie data structure to a gathered string;
means for determining if a node is flagged as a node for a word or a word fragment and means for comparing the length of the target string to the length of the gathered string;
means for looping back to the root node when the means for determining determines a node flagged as a node for a word or a word fragment and the means for comparing determines that the target string is longer than the gathered string. - View Dependent Claims (12, 13, 14)
-
-
15. A computer program product stored on a computer readable storage medium, comprising computer readable program code means for performing the steps of:
-
traversing a trie data structure starting from the root node by comparing each node of a branch of the trie data structure to characters in the target string;
adding characters traversed in a branch of the trie data structure to a gathered string;
reaching a node flagged as a node for a word or a word fragment and, if the target string is longer than the gathered string, looping back to the root node, and continuing the traverse from the root node.
-
-
16. A method of approximate string matching of a target string to a trie data structure, the trie data structure having a root node and generations of child nodes each node representing a character in an alphabet, the method comprising:
-
traversing a trie data structure starting from a root node by comparing each node of a branch of the trie data structure to characters in the target string;
at each node, determining if there is a correction rule for one or more characters in the remainder of the target string from the current node;
if so, applying the correction rule to the target string to obtain a modified target string. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
-
-
28. A system for approximate string matching of a target string to a trie data structure, the system comprising:
-
a trie data structure having a root node and generations of child nodes each node representing at least one character in an alphabet;
means for traversing a trie data structure starting from the root node by comparing each node of a branch of the trie data structure to characters in the target string and adding characters traversed in a branch of the trie data structure to a gathered string; and
a set of correction rules for applying a sequence to sequence substitution within a target string. - View Dependent Claims (29, 30, 31, 32, 33)
-
-
34. A computer program product stored on a computer readable storage medium, comprising computer readable program code means for performing the steps of:
-
traversing a trie data structure starting from a root node by comparing each node of a branch of the trie data structure to characters in the target string;
at each node, determining if there is a correction rule for one or more characters in the remainder of the target string from the current node;
if so, applying the correction rule to the target string to obtain a modified target string.
-
Specification