Confusion set-base method and apparatus for pruning a predetermined arrangement of indexed identifiers
First Claim
1. A method of matching an input identifier to one of a set of reference identifiers, each of the reference identifiers being associated with one of a plurality of reference index codes, the method comprising the steps of:
- a) providing a recognized identifier, the input identifier being provided by a user;
on the basis of the input identifier;
b) determining an index code for the recognized identifier;
c) determining which of the plurality of reference index codes are within a predetermined distance of the index code for the recognized identifier;
d) accessing the reference identifiers associated with the reference index codes determined in step c), the accessed reference identifiers forming a candidate subset of reference identifiers;
e) searching the candidate set of reference identifiers for a reference identifier that matches the recognized identifier; and
g) selecting a matching reference identifier as corresponding to the input identifier, wherein each reference identifier is a set of alphanumeric characters.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus for reducing a set of reference identifiers to a candidate subset of reference identifiers. The reference identifiers are associated in memory with a plurality of index codes. A user provides an input identifier, causing a recognizing device of the present invention to produce a recognized identifier on the basis of the input identifier. The present invention determines an index code based on the recognized identifier and on the basis of a plurality of pre-stored confusion sets of characters that group together in individual confusion sets those characters having a relatively high likelihood of being confused with each other by the recognizing device. After matching the determined index code with one of the reference index codes, the present invention determines which reference index codes are within a predetermined distance of the matched reference index code. The reference identifiers that are associated with these reference index codes constitute the candidate subset of reference identifiers.
108 Citations
19 Claims
-
1. A method of matching an input identifier to one of a set of reference identifiers, each of the reference identifiers being associated with one of a plurality of reference index codes, the method comprising the steps of:
-
a) providing a recognized identifier, the input identifier being provided by a user;
on the basis of the input identifier;
b) determining an index code for the recognized identifier;
c) determining which of the plurality of reference index codes are within a predetermined distance of the index code for the recognized identifier;
d) accessing the reference identifiers associated with the reference index codes determined in step c), the accessed reference identifiers forming a candidate subset of reference identifiers;
e) searching the candidate set of reference identifiers for a reference identifier that matches the recognized identifier; and
g) selecting a matching reference identifier as corresponding to the input identifier, wherein each reference identifier is a set of alphanumeric characters. - View Dependent Claims (2, 3, 4, 5, 6, 7)
i) providing a plurality of confusion sets; and
ii) determining the index code for the recognized identifier on the basis of the recognized identifier and the plurality of confusion sets.
-
-
3. The method according to claim 2, wherein each one of the input identifier, the recognized identifier, and the plurality of reference identifiers comprise at least one character selected from a plurality of characters.
-
4. The method according to claim 3, wherein the step i) comprises:
-
iii) obtaining, for each one of the plurality of characters, a plurality of recognition values, each recognition value representing a likelihood that one of the plurality of characters is recognized as another one of the plurality of characters; and
iv) dividing the plurality of characters into a plurality of subsets of characters, each subset of characters including characters having a recognition value of being recognized as another character of the same subset that is higher than a predetermined threshold, wherein each of the confusion sets corresponds to one of the subsets of characters.
-
-
5. The method according to claim 2, wherein each one of the plurality of reference index codes and the index code for the recognized identifier comprises a plurality of segments, each segment comprising a predetermined number of bits, wherein each segment is associated with one of the plurality of confusion sets.
-
6. The method according to claim 5, wherein the step ii) comprises:
-
iii) determining, for each confusion set, an amount of characters of the confusion set appearing in the recognized identifier; and
iv) encoding, for each segment, the amount of characters of the associated confusion set determined in step iii) appearing in the recognized identifier.
-
-
7. The method according to claim 1, wherein the recognized identifier is entered by the user speaking into a voice input device.
-
8. A method of matching an input identifier to one of a set of reference identifiers, each of the reference identifiers being associated with one of a plurality of reference index codes, the method comprising the steps of:
-
a) providing a recognized identifier on the basis of the input identifier;
b) determining an index code for the recognized identifier;
c) determining which of the plurality of reference index codes are within a predetermined distance of the index code for the recognized identifier;
d) accessing the reference identifiers associated with the reference index codes determined in step c), the accessed reference identifiers forming a candidate subset of reference identifiers;
e) searching the candidate set of reference identifiers for a reference identifier that matches the recognized identifier; and
g) selecting a matching reference identifier as corresponding to the input identifier wherein the step c) comprises;
i) arranging the reference index codes into a predetermined data structure, wherein the predetermined data structure comprises a plurality of nodes, each node representing at least a portion of at least one of the plurality of reference index codes;
ii) traversing the predetermined data structure, wherein moving from any one of the nodes to another node along a first predetermined direction represents a first value, and wherein moving from any one of the nodes to another node along a second predetermined direction represents a second value;
iii) calculating a travel path value for each node based on at least one of the first predetermined value and the second predetermined value, each reference index code being associated with a corresponding travel path value;
iv) comparing a value of each bit position of each travel path value with a value of each corresponding bit position of the index code for the reference identifier;
v) determining for each travel path value an amount of bit positions having a different value than those in corresponding bit positions of the index code for the recognized identifier; and
vi) including in a candidate set of reference index codes those reference index codes associated with travel path values for which the associated amount of bit positions having a different value than the corresponding bit positions of the index code for the recognized identifier is no more than the predetermined distance. - View Dependent Claims (9)
-
-
10. An apparatus for matching an input identifier to one of a set of reference identifiers, each of the reference identifiers being associated with one of a plurality of reference index codes, the apparatus comprising:
-
a) first means for providing a recognized identifier on the basis of the input identifier;
b) first means for determining an index code for the recognized identifier, the input identifier being provided by a user;
c) second means for determining which of the plurality of reference index codes are within a predetermined distance of the index code for the recognized identifier;
d) means for accessing the reference identifiers associated with the reference index codes determined by the second means for determining, the accessed reference identifiers forming a candidate subset of reference identifiers;
e) means for searching the candidate set of reference identifiers for a reference identifier that matches the recognized identifier; and
f) means for selecting a matching reference identifier as corresponding to the input identifier, wherein each reference identifier is a set of alphanumeric characters. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
i) means for providing a plurality of confusion sets; and
ii) third means for determining the index code of the recognized identifier on the basis of the recognized identifier and the plurality of confusion sets.
-
-
12. The apparatus according to claim 11, wherein each one of the input identifier, the recognized identifier, and the plurality of reference identifiers comprises at least one character selected from a plurality of characters.
-
13. The apparatus according to claim 12, wherein the second means for providing comprises:
-
iii) means for obtaining, for each one of the plurality of characters, a plurality of recognition values, each recognition value representing a likelihood that one of the plurality of characters is recognized as another one of the plurality of characters; and
iv) means for dividing the plurality of characters into a plurality of subsets of characters, each subset of characters including characters having a recognition value of being recognized as another character of the same subset that is higher than a predetermined threshold, wherein each of the confusion sets corresponds to one of the subsets of characters.
-
-
14. The apparatus according to claim 11, wherein each one of the plurality of reference index codes and the index code for the recognized identifier comprises a plurality of segments, each segment comprising a predetermined number of bits, wherein each segment is associated with one of the plurality of confusion sets.
-
15. The apparatus according to claim 14, wherein the third means for determining comprises:
-
iii) fourth means for determining, for each confusion set, an amount of characters of the confusion set appearing in the recognized identifier; and
iv) means for encoding, for each segment, the amount of characters of the associated confusion set determined by the third means for determining to appear in the recognized identifier.
-
-
16. The apparatus according to claim 10, wherein the second means for determining comprises:
-
i) means for arranging the reference index codes into a predetermined data structure, wherein the predetermined data structure comprises a plurality of nodes, each node representing at least a portion of at least one of the plurality of reference index codes;
ii) means for traversing the predetermined data structure, wherein moving from any one of the nodes to another node along a first predetermined direction represents a first value, and wherein moving from any one of the nodes to another node along a second predetermined direction represents a second value;
iii) means for calculating a travel path value for each node based on at least one of the first predetermined value and the second predetermined value, each reference index code being associated with a corresponding travel path value;
iv) means for comparing a value of each bit position of each travel path value with a value of each corresponding bit position of the index code for the reference identifier;
v) third means for determining for each travel path value an amount of bit positions having a different value than those in corresponding bit positions of the index code for the recognized identifier; and
vi) means for including in a candidate set of reference index codes those reference index codes associated with travel path values for which the associated amount of bit positions having a different value than the corresponding bit positions of the index code for the recognized index code is no more than the predetermined distance.
-
-
17. The apparatus according to claim 16, wherein the predetermined data structure comprises a tree data structure.
-
18. An apparatus, comprising:
-
a processing device;
a recognizing device coupled to the processing device, the recognizing device including an input for receiving an input identifier, the input identifier being provided by a user;
a reference identifier database coupled to the processing device, the reference identifier database including at least one reference identifier, each reference identifier being associated with one of a plurality of reference index codes;
an index pruning module coupled to the processing device;
a confusion matrix memory coupled to the processing device; and
a confusion set generating module coupled to the processing device. - View Dependent Claims (19)
a data input device coupled to the processing device; and
a display device coupled to the processing device.
-
Specification