Fuzzy string matcher
First Claim
Patent Images
1. A method for comparing a test character string to a plurality of valid data strings to identify valid data strings which closely match the test character string, comprising the steps of:
- (a) subdividing the test character string into sets of N adjacent characters;
(b) storing in a first count array a frequency of one set of N adjacent characters in the test string;
(c) comparing said frequency to an initial frequency value to obtain an initial holographic distance;
(d) repeating step (b) and (c) for all sets of N adjacent characters formed in step (a);
(e) subdividing a given valid data string into sets of N adjacent characters;
(f) storing in a second count array a frequency of one set of N adjacent characters found in said given valid character string;
(g) comparing said frequency of said set of N adjacent characters found in the test character string to said frequency of said given set of N adjacent characters found in said valid character string to obtain an updated holographic distance;
(h) repeating steps (f) and (g) for all sets of N adjacent characters formed in step (e);
(i) clearing said updated holographic distance to said initial holographic distance and clearing said second count array to an initial value; and
(j) repeating steps (e) through (i) inclusive for each of said plurality of valid character strings.
2 Assignments
0 Petitions
Accused Products
Abstract
An apparatus and method useful in correcting character strings misread by a character recognition engine. The misread string is compared to character strings contained in a lexicon of valid character strings and the similarity between the strings measured. The valid strings can be ranked by degree of similarity should further selection be required to determine the appropriate value of the misread character string.
53 Citations
48 Claims
-
1. A method for comparing a test character string to a plurality of valid data strings to identify valid data strings which closely match the test character string, comprising the steps of:
-
(a) subdividing the test character string into sets of N adjacent characters; (b) storing in a first count array a frequency of one set of N adjacent characters in the test string; (c) comparing said frequency to an initial frequency value to obtain an initial holographic distance; (d) repeating step (b) and (c) for all sets of N adjacent characters formed in step (a); (e) subdividing a given valid data string into sets of N adjacent characters; (f) storing in a second count array a frequency of one set of N adjacent characters found in said given valid character string; (g) comparing said frequency of said set of N adjacent characters found in the test character string to said frequency of said given set of N adjacent characters found in said valid character string to obtain an updated holographic distance; (h) repeating steps (f) and (g) for all sets of N adjacent characters formed in step (e); (i) clearing said updated holographic distance to said initial holographic distance and clearing said second count array to an initial value; and (j) repeating steps (e) through (i) inclusive for each of said plurality of valid character strings. - View Dependent Claims (2, 3, 4, 5, 46, 47, 48)
-
-
6. A device for comparing a test character string to a plurality of valid data strings of a lexicon to identify valid data strings which closely match the test character string comprising:
-
a set of N registers, having an input coupled to said lexicon and to said test character string, for inputting a set of N adjacent characters to form a given N-gram; a set of N registers, having an input coupled to said lexicon and to said test character string, for inputting a set of N adjacent characters to form a given N-gram; a hashing logic circuit coupled to an output of said set of N registers, for asserting a count array address corresponding to the given N-gram; a first count array, having an input coupled to an output of said hashing logic circuit, for storing a frequency of said given N-gram in the test character string; a second count array, having an input coupled to an output of said hashing logic circuit, for storing a frequency of said given N-gram in a given valid character string; means, coupled to said input of said first and second count arrays and to an output of said first and second count arrays, for incrementing said frequencies stored in said first and second count arrays; an arithmetic logic unit for comparing said frequency stored in said first array to said frequency stored in said second array to obtain a distance update and having; (i) a first set of inputs coupled to said first array; (ii) a second set of inputs coupled to said second array; and (iii) an output having m sum bits and a single carry bit; and an accumulator for updating a current holographic distance to obtain an updated holographic distance and having; (i) a lowest order input coupled to an asserted signal; (ii) m inputs coupled to said m sum bits of said arithmetic logic unit; (iii) a most significant bit coupled to said carry bit of said arithmetic logic unit; and (iv) an output. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13)
-
-
14. A device for comparing a test character string to a plurality of valid data strings of a lexicon to identify valid data strings which closely match the test character string comprising:
-
a set of N registers, having an input coupled to said lexicon and to said test character string, for inputting a set of N adjacent characters to form a given N-gram; a hashing logic circuit coupled to an output of said set of N registers, for asserting a count array address corresponding to the given N-gram; a first count array, having an input coupled to an output of said hashing logic circuit, for storing a frequency of said given N-gram in the test character string; a second count array, having an input coupled to an output of said hashing logic circuit, for storing a frequency of said given N-gram in a given valid character string; means, coupled to said input of said first and second count arrays and to an output of said first and second count arrays, for incrementing said frequencies stored in said first and second count arrays; an arithmetic logic unit for comparing said frequency stored in said first array to said frequency stored in said second array to obtain a distance update and having; (i) a first set of inputs coupled to said first array; (ii) a second set of inputs coupled to said second array; and (iii) an output having m sum bits and a single carry bit; and an accumulator for updating a current holographic distance to obtain an updated holographic distance and having; (i) a lowest order input coupled to an asserted signal; (ii) m inputs coupled to said m sum bits of said arithmetic logic unit; (iii) a most significant bit coupled to said carry bit of said arithmetic logic unit; and (iv) an output. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
-
21. A device for comparing a test character string to a plurality of valid data strings to identify valid data strings which closely match the test character string comprising:
-
a set of N registers, having an input coupled to a lexicon and to the test character string, for inputting a set of N adjacent characters to form a given N-gram; a hashing logic circuit coupled to an output of said set of N registers, for asserting a count array address corresponding to the given N-gram; a first count array, having an input coupled to an output of said hashing logic circuit, for storing a frequency of said given N-gram in a given valid character string read from said lexicon; means, coupled to said input of said first and second count arrays and to an output of said first and second count arrays, for incrementing said frequencies stored in said first and second count arrays; an arithmetic logic unit for comparing said frequency stored in said first array to said frequency stored in said second array to obtain a distance update and having; (i) a first set of inputs coupled to said second array; (ii) a second set of inputs coupled to said second array; and (iii) an output having m sum bits and a single carry bit; a multiplexer, having a first input coupled to said m sum bits of said arithmetic logic unit, a second input coupled to said carry bit of said arithmetic logic unit and m outputs; and an accumulator for updating a current holographic distance to obtain an updated holographic distance and having; (i) a lowest order input coupled to an asserted signal; (ii) m inputs coupled to said m outputs of said multiplexer; (iii) a most significant bit input coupled to said carry bit of said arithmetic logic unit; and (iv) an output. - View Dependent Claims (22, 23, 24, 25, 26, 27)
-
-
28. A device for comparing a test character string to a plurality of valid data strings of a lexicon to identify valid data strings which closely match the test character string comprising:
-
a set of N registers, having an input coupled to said lexicon and to said test character string, for inputting a set of N adjacent characters to form a given N-gram; a first hashing logic circuit, coupled to an output of said set of N registers, for asserting a first count array address corresponding to the given N-gram; a first count array, having an input coupled to an output of said first hashing logic circuit, for storing a frequency of said given N-gram in the test character string; a second count array, having an input coupled to an output of said first hashing logic circuit, for storing a frequency of said given N-gram in a given valid character string read from said lexicon; means, coupled to said input of said first and second count arrays and to an output of said first and second count arrays, for incrementing said frequencies stored in said first and second count arrays; means, having an input coupled to said output of said first and second count arrays, for comparing said frequency stored in said first count array and said frequency stored in second count array to obtain a first distance update; means, coupled to an output of said means for comparing, for adding said first distance update to a current holographic distance to obtain a first updated holographic distance; a second hashing logic circuit, coupled to M outputs of said set of N registers, for asserting a second count array address corresponding to the M outputs wherein M<
N;a third count array, having an input coupled to an output of said second hashing logic circuit, for storing a frequency of a given M-gram in the test character string; a fourth count array, having an input coupled to an output of said second hashing logic circuit, for storing a frequency of said given M-gram in a given valid character string read from said lexicon; means, coupled to said input of said third and fourth count arrays and to an output of said third and fourth count arrays, for incrementing said frequencies stored in said third and fourth count arrays; means, having an input coupled to said output of said third and fourth count arrays, for comparing said frequency stored in said third count array and said frequency stored in said fourth count array to obtain a second distance update; means, coupled to an output of said means for comparing, for adding said second distance update to a second current holographic distance to obtain a second updated holographic distance; and means, coupled to said means for obtaining a first updated holographic distance and to said means for obtaining a second updated holographic distance, for adding said first updated holographic distance to said second updated holographic distance to obtain a total updated holographic distance. - View Dependent Claims (29, 30, 31, 32, 33, 34, 35, 36, 37)
-
-
38. A system for resolving a character recognition error comprising:
-
a set of N registers, having an input coupled to a lexicon and to a test character string, for inputting a set of N adjacent characters to form a given N-gram; a hashing logic circuit, coupled to an output of said set of N registers, for asserting a count array address corresponding to the given N-gram; a first register array, having an input coupled to an output of said hashing logic circuit, for indicating occurrences of said given N-gram in the test character string; a second register array, having an input coupled to an output of said hashing logic circuit, for indicating occurrences of said given N-gram in a given valid character string read from said lexicon; an increment circuit coupled to said input of said first and second register arrays and to an output of said first and second register arrays for incrementing respective values stored in said first and second register arrays, the values corresponding to frequencies of said given N-gram in the respective character strings; means, having an input coupled to said output of said first and second register arrays, for comparing said frequency stored in said first register array and said frequency stored in said second register array to obtain a distance update; and means, coupled to an output of said means for comparing, for adding said distance update to a current holographic distance to obtain an updated holographic distance. - View Dependent Claims (39, 40, 41, 42, 43, 44, 45)
-
Specification