Method and system for approximate string matching
First Claim
1. A method of approximate string matching of a target string to a trie data structure, the trie data structure having a root node and generations of child nodes each node representing at least one character in an alphabet, the method comprising:
- traversing a trie data structure starting from the root node by comparing each node of a branch of the trie data structure to characters in the target string;
determining, at each node, if there is a correction rule for one or more characters in the remainder of the target string from the current node, and, if so, applying the correction rule to the target string to modify the target string to obtain a modified target string, wherein applying the correction rule includes performing a sequence to sequence character substitution on the target string to obtain the modified target string, and continuing to traverse the trie data structure from the current node for both the modified target string and the original target string, wherein no additional modifications of the modified target string are allowed within its modified parts;
adding characters traversed in a branch of the trie data structure to a gathered string;
reaching a node flagged as a node for a word or a word fragment, comparing the length of the target string to the length of the gathered string, and, if the target string is longer than the gathered string, looping back to the root node, and continuing the traverse from the root node; and
providing at least one suggestion from the trie data structure.
1 Assignment
0 Petitions
Accused Products
Abstract
Approximate string matching of a target string to a trie data structure. The trie data structure has a root node and generations of child nodes each node representing at least one character in an alphabet to provide a lexicon of words and word fragments. The trie data structure is traversed starting from the root node by comparing each node of a branch of the trie data structure to characters in the target string and adding characters traversed in a branch of the trie data structure to a gathered string to provide suggestions of approximate matches. If a node is reached that is flagged as a node for a word or a word fragment and, if the target string is longer than the gathered string, the traversal loops back to the root node, and continues to traverse from the root node. This enables the trie data structure to use word fragments for compound words and to split non-delimited words where appropriate. A determination may be made, at each node, as to whether there is a correction rule for one or more characters in the remainder of the target string from the current node, and if so, the correction rule is applied to the target string to obtain a modified target string.
300 Citations
15 Claims
-
1. A method of approximate string matching of a target string to a trie data structure, the trie data structure having a root node and generations of child nodes each node representing at least one character in an alphabet, the method comprising:
-
traversing a trie data structure starting from the root node by comparing each node of a branch of the trie data structure to characters in the target string; determining, at each node, if there is a correction rule for one or more characters in the remainder of the target string from the current node, and, if so, applying the correction rule to the target string to modify the target string to obtain a modified target string, wherein applying the correction rule includes performing a sequence to sequence character substitution on the target string to obtain the modified target string, and continuing to traverse the trie data structure from the current node for both the modified target string and the original target string, wherein no additional modifications of the modified target string are allowed within its modified parts; adding characters traversed in a branch of the trie data structure to a gathered string; reaching a node flagged as a node for a word or a word fragment, comparing the length of the target string to the length of the gathered string, and, if the target string is longer than the gathered string, looping back to the root node, and continuing the traverse from the root node; and providing at least one suggestion from the trie data structure. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system including at least one processor and a memory storing program code for execution on the processor for approximate string matching of a target string to a trie data structure, comprising:
-
a trie data structure stored in said memory and having a root node and generations of child nodes each node representing at least one character in an alphabet; program code for traversing a trie data structure starting from the root node by comparing each node of a branch of the trie data structure to characters in the target string and adding characters traversed in a branch of the trie data structure to a gathered string; program code for determining, at each node, if there is a correction rule for one or more characters in the remainder of the target string from the current node, and, if so, applying the correction rule to the target string to modify the target string to obtain a modified target string, wherein applying the correction rule includes performing a sequence to sequence character substitution on the target string to obtain the modified target string, for continuing to traverse the trie data structure from the current node for both the modified target string and the original target string, and wherein no additional modifications of the modified target string are allowed within its modified parts; program code for determining if a node is flagged as a node for a word or a word fragment and for comparing the length of the target string to the length of the gathered string; program code for looping back to the root node when a node is flagged as a node for a word or a word fragment and the target string is longer than the gathered string; and program code for providing at least one suggestion from the trie data structure. - View Dependent Claims (10, 12, 13, 14, 15)
-
-
11. A computer program product stored on a computer readable storage medium, comprising program code for performing the steps of:
-
traversing a trie data structure, the trie data structure having a root node and generations of child nodes each node representing at least one character in an alphabet, starting from the root node by comparing each node of a branch of the trie data structure to characters in the target string; determining, at each node, if there is a correction rule for one or more characters in the remainder of the target string from the current node, and, if so, applying the correction rule to the target string to modify the target string to obtain a modified target string, wherein applying the correction rule includes performing a sequence to sequence character substitution on the target string to obtain the modified target string, and continuing to traverse the trie data structure from the current node for both the modified target string and the original target string, wherein no additional modifications of the modified target string are allowed within its modified parts; adding characters traversed in a branch of the trie data structure to a gathered string; reaching a node flagged as a node for a word or a word fragment, comparing the length of the target string to the length of the gathered string, and, if the target string is longer than the gathered string, looping back to the root node, and continuing the traverse from the root node; and providing at least one suggestion from the trie data structure.
-
Specification