Two step method for correcting spelling of a word or phrase in a document
First Claim
Patent Images
1. A method for correcting the spelling of a word or phrase in a document comprising the steps of:
- applying an approximate method for eliminating some candidate words from consideration, without computing an exact edit distance between a given word whose spelling is to be corrected and any candidate word;
followed by applying an exact method which computes an exact edit distance between the word whose spelling is to be corrected and each of the remaining candidate words, wherein G is a given word whose spelling is to be corrected and the approximate method comprises the steps of;
pre-computing a vector Gpoll whose length is the number of letters in the alphabet from which G is constructed, the value of each component of Gpoll being a number of times a letter corresponding to that component appears in G;
iterating on the letters in the candidate word or phrase C to be processed by defining two integers Cval which counts a number of letters already considered in C which do not appear in G, and Gval which counts a number of letters in G which do not appear among the characters already considered in C;
processing a next letter l in C by checking a count in a corresponding entry of Gpoll, and if that entry is positive, decrementing both Gval and the value of the lth component of Gpoll by one, but if that entry is zero, incrementing Cval by one;
determining if Cval is larger than a maximal edit distance Δ
, rejecting C from consideration; and
continuing letter by letter until either C has been rejected or all its letters have been processed, and in the latter case, if Gval is larger than Δ
, then rejecting C from consideration, but otherwise, accepting C, provided its actual distance from G is not more than Δ
.
3 Assignments
0 Petitions
Accused Products
Abstract
A very fast method for correcting the spelling of a word or phrase in a document proceeds in two steps: first applying a very fast approximate method for eliminating most candidate words from consideration (without computing the exact edit distance between the given word whose spelling is to be corrected and any candidate word), followed by a “slow method” which computes the exact edit distance between the word whose spelling is to be corrected and each of the few remaining candidate words. The combination results in a method that is almost as fast as the fast approximate method and as exact as the slow method.
47 Citations
4 Claims
-
1. A method for correcting the spelling of a word or phrase in a document comprising the steps of:
-
applying an approximate method for eliminating some candidate words from consideration, without computing an exact edit distance between a given word whose spelling is to be corrected and any candidate word;
followed by applying an exact method which computes an exact edit distance between the word whose spelling is to be corrected and each of the remaining candidate words, wherein G is a given word whose spelling is to be corrected and the approximate method comprises the steps of;
pre-computing a vector Gpoll whose length is the number of letters in the alphabet from which G is constructed, the value of each component of Gpoll being a number of times a letter corresponding to that component appears in G;
iterating on the letters in the candidate word or phrase C to be processed by defining two integers Cval which counts a number of letters already considered in C which do not appear in G, and Gval which counts a number of letters in G which do not appear among the characters already considered in C;
processing a next letter l in C by checking a count in a corresponding entry of Gpoll, and if that entry is positive, decrementing both Gval and the value of the lth component of Gpoll by one, but if that entry is zero, incrementing Cval by one;
determining if Cval is larger than a maximal edit distance Δ
, rejecting C from consideration; and
continuing letter by letter until either C has been rejected or all its letters have been processed, and in the latter case, if Gval is larger than Δ
, then rejecting C from consideration, but otherwise, accepting C, provided its actual distance from G is not more than Δ
.- View Dependent Claims (2)
-
-
3. A computer readable medium containing code implementing a method for correcting the spelling of a word or phrase in a document, the code contained in said computer readable medium comprising:
-
first code implementing an approximate method for eliminating most candidate words from consideration, without computing an exact edit distance between a given word whose spelling is to be corrected and any candidate word; and
second code implementing an exact method which computes an exact edit distance between the word whose spelling is to be corrected and each of the few remaining candidate words, said second code being called after execution of said first code, wherein G is a given word whose spelling is to be corrected and the first code includes;
code for pre-computing a vector Gpoll whose length is the number of letters in the alphabet from which G is constructed, the value of each component of Gpoll being a number of times a letter corresponding to that component appears in G;
code for iterating on the letters in the candidate word or phrase C to be processed by defining two integers Cval which counts a number of letters already considered in C which do not appear in G, and Gval which counts a number of letters in G which do not appear among the characters already considered in C;
code for processing a next letter l in C by checking a count in a corresponding entry of Gpoll, and if that entry is positive, decrementing both Gval and the value of the lth component of Gpoll by one, but if that entry is zero, incrementing Cval by one;
code for determining if Cval is larger than a maximal edit distance Δ
, rejecting C from consideration; and
code for continuing letter by letter until either C has been rejected or all its letters have been processed, and in the latter case, if Gval is larger than Δ
, then rejecting C from consideration, but otherwise, accepting C, provided its actual distance from G is not more than Δ
.- View Dependent Claims (4)
-
Specification