Locating digital coded words which are both acceptable misspellings and acceptable inflections of digital coded query words
First Claim
1. A method using a digital data processing means for locating from a plurality of digital coded candidate words at least one candidate word which is both an acceptable misspelling and an acceptable inflection of a digital coded query word, the candidate and query words each comprising plural characters, the method comprising the steps of:
- determining characters forming a stem portion and an ending portion of such query word;
forming a suffix class indication for any one of a plurality of classes in which the query word may be included;
comparing the characters forming the stem portion with characters starting at the beginning of each of a plurality of such candidate words for finding candidate words having acceptable misspelling matches and those with nonacceptable misspelling matches;
determining characters forming an ending portion, if any, in each of individual ones of the candidate words;
utilizing the suffix class indication to select, from among other suffixes, a representation of characters forming at least one acceptable suffix for the candidate words; and
comparing, character by character, the characters of said at least one acceptable suffix with the characters in the ending portion of each of individual ones of the candidate words for finding candidate words having acceptable ending portions;
the first and second recited steps of comparing thereby locating the candidate words which are both an acceptable misspelling match and an acceptable inflection of the query word.
6 Assignments
0 Petitions
Accused Products
Abstract
A method is disclosed using a digital data processing means for determining from a plurality of candidate words at least one which is both an acceptable spelling and an acceptable inflection of a query word. The words are represented by machine readable coded signals and comprise plural characters. The steps are as follows: Determine a stem portion of such query word. Form a suffix class indication for any one of a plurality of classes in which the query word may be included. Compare the determined query stem with characters in the beginning of such candidate words for finding acceptable and nonacceptable spelling matches. Determine an ending portion, if any, in each individual candidate words which is an acceptable spelling match. Utilize the suffix class indication to select a representation of at least one acceptable suffix for the candidate words. Compare a representation of the at least one selected acceptable suffix and the determined ending portions in the individual candidate words which are acceptable spelling matches to determine at least one predetermined acceptable relation therebetween.
-
Citations
44 Claims
-
1. A method using a digital data processing means for locating from a plurality of digital coded candidate words at least one candidate word which is both an acceptable misspelling and an acceptable inflection of a digital coded query word, the candidate and query words each comprising plural characters, the method comprising the steps of:
-
determining characters forming a stem portion and an ending portion of such query word; forming a suffix class indication for any one of a plurality of classes in which the query word may be included; comparing the characters forming the stem portion with characters starting at the beginning of each of a plurality of such candidate words for finding candidate words having acceptable misspelling matches and those with nonacceptable misspelling matches; determining characters forming an ending portion, if any, in each of individual ones of the candidate words; utilizing the suffix class indication to select, from among other suffixes, a representation of characters forming at least one acceptable suffix for the candidate words; and comparing, character by character, the characters of said at least one acceptable suffix with the characters in the ending portion of each of individual ones of the candidate words for finding candidate words having acceptable ending portions; the first and second recited steps of comparing thereby locating the candidate words which are both an acceptable misspelling match and an acceptable inflection of the query word. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. Program controlled digital data processing means for locating from a plurality of digital coded candidate words at least one candidate word which is both an acceptable misspelling and an acceptable inflection of a digital coded query word, the query word and each of a plurality of the query words comprising plural characters, the data processing means comprising:
-
(a) programmed digital data processing means for determining characters forming a stem portion and an ending portion of such query word and for determining and forming a suffix class indication of any one of a plurality of classes in which the query word may be included; (b) programmed digital data processing means for comparing the characters forming the stem portion of the query word with characters starting at the beginning of each of a plurality of such candidate words for finding candidate words having beginning portions with acceptable misspelling matches and those with nonacceptable misspelling matches and, for each of individual ones of those candidate words having an acceptable misspelling match, operative for forming an acceptable misspelling class indication representing a value for any one of a plurality of classes in which the acceptable misspelling match for such candidate word may be included; (c) programmed digital data processing means utilizing the acceptable misspelling class indication for each of individual ones of the candidate words to identify an ending portion, if any, in the corresponding candidate word; (d) programmed digital data processing means for utilizing the suffix class indication for the query word to select from among other suffixes a representation of at least one acceptable suffix for the candidate words; and (e) programmed digital data processing means for comparing the characters of said at least one acceptable suffix with the characters of the ending portion in each of individual ones of the candidate words for finding candidate words having acceptable ending portions. - View Dependent Claims (19)
-
-
20. Digital data processing means for locating from a plurality of digital coded candidate words at least one which is both an acceptable misspelling and an acceptable inflection of a digital coded query word, the query word and each of a plurality of the candidate words comprising plural characters, the means comprising:
-
(a) first programmable digital data processing means comprising first control program means, the first programmable digital data processing means, at least in part under control of the first control program means, being operative for processing characters of the query word for thereby determining characters of the query word forming a stem portion and characters forming an ending portion and for additionally determining a suffix class indication for the characters of the query word; (b) second programmable digital data processing means comprising second control program means, the second programmable digital data processing means, at least in part under control of the second control program means, being operative for comparing the stem portion of the query word with characters at the beginning of each of a plurality of said candidate words for determining those candidate words which have acceptable misspelling matches with the stem portion of the query word; and (c) third programmable digital data processing means comprising third control program means, the third programmable digital data processing means, at least in part under control of the third control program means, being operative for using the suffix indication to select from other suffixes an acceptable suffix composed of one or more characters and for comparing characters forming an ending portion, after said characters at the beginning, of each of individual ones of the candidate words with the selected suffixes to thereby determine those candidate words which have an acceptable ending; (d) the second and third programmable digital data processing means thereby determining candidate words having both acceptable misspellings and acceptable inflections of the query word. - View Dependent Claims (21)
-
-
22. Digital data processing means for locating from a plurality of digital coded candidate words at least one which is both an acceptable misspelling and an acceptable inflection of a digital coded query word, the query word and each of a plurality of the candidate words each comprising plural characters, the processing means comprising:
-
data processing means for determining a stem portion and an ending portion of such query word and for forming an indication of the size of one of said portions; data processing means for determining and forming a suffix class indication of at least one of a plurality of classes in which the query word may be included; first memory means for storing representations of a data base comprising said candidate words; means for deriving from the data base in the first memory means representations of said candidate words; means for comparing representations of the stem portion of the query word with representations of the characters at the beginning of each of individual ones of the candidate words which are derived from the data base for finding either an acceptable misspelling match or a nonacceptable misspelling match and, for the acceptable misspelling match, determining and forming an acceptable misspelling match class indication representing any one of a plurality of classes in which the acceptable misspelling match may be included; means for utilizing the acceptable misspelling match class indication for modifying representations of the indication of size to determine the characters forming an ending portion, if any, in the candidate words; second memory means for storing representations of a plurality of acceptable suffixes, each acceptable suffix comprising one or more characters, the acceptable suffixes being arranged in groups and representations of each group being selectable from the other groups in the second memory means in accordance with one of the suffix class indications; means for utilizing the suffix class indication for the query word to select from the second memory means, representations of at least one of the groups of acceptable suffixes; and means for comparing representations of the acceptable suffixes which have been selected with representations of the ending portions of individual ones of the candidate words for acceptable relations therebetween; those candidate words having both the acceptable misspelling match and the acceptable relation to the acceptable suffixes being both the acceptable misspelling and the acceptable inflection of the query word.
-
-
23. A digital data processing means for locating from a plurality of digital coded candidate words at least one which is both an acceptable misspelling and an acceptable inflection of a digital coded query word, the query word and each of plural ones of the candidate words comprising plural characters, the means comprising:
-
means for determining the characters forming a stem portion and an ending portion of such query word; means for forming a suffix class indication for any one of a plurality of classes in which the query word may be included; means for comparing the characters of the stem portion of the query word with characters in the beginning of such candidate words for finding candidate words with acceptable misspelling matches and candidate words with nonacceptable misspelling matches; means for determining characters forming an ending portion, if any, in each of individual ones of the candidate words; means for utilizing the suffix class indication to select from among other suffixes a representation of characters forming at least one acceptable suffix for the candidate words; and means for comparing character by character the characters of said at least one selected acceptable suffix with the characters in the ending portion in each of the individual ones of the candidate words for finding acceptable ending portions, the first and second recited means thereby locating candidate words which are both an acceptable misspelling and an acceptable inflection of the query word. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39)
-
-
40. A method using a program controlled digital data processing means for locating from a plurality of digital coded candidate words at least one candidate word which is both an acceptable misspelling and an acceptable inflection of a digital coded query word, the query word and each of a plurality of the query words comprising plural characters, the method comprising the steps of:
-
(a) determining characters forming a suffix portion and an ending portion of such query word and for determining and forming a suffix class indication of any one of a plurality of classes in which the query word may be included; (b) comparing the characters forming the stem portion of the query word with characters starting at the beginning of each of a plurality of such candidate words for finding candidate words having stem portions with acceptable misspelling matches and those with nonacceptable misspelling matches and, for each of individual ones of those candidate words having an acceptable misspelling match, forming an acceptable misspelling class indication representing a value for any one of a plurality of classes in which the acceptable misspelling match for such candidate word may be included; (c) utilizing the acceptable misspelling class indication for each of individual ones of the candidate words to identify an ending portion, if any, in the corresponding candidate word; (d) utilizing the suffix class indication for the query word to select from among other suffixes a representation of at least one acceptable suffix for the candidate words; and (e) comparing the characters of said at least one acceptable suffix, character by character, with the characters of the ending portion in each of individual ones of the candidate words for finding candidate words having acceptable ending portions. - View Dependent Claims (41)
-
-
42. A method using a digital data processing means for locating from a plurality of digital coded candidate words at least one candidate word which is both an acceptable misspelling and an acceptable inflection of a digital coded query word, the query word and each of plural ones of the candidate words comprising plural characters, the method comprising the steps of:
-
determining characters forming a stem portion and an ending portion of such query word; comparing the characters forming the stem portion with characters starting at the beginning of each of a plurality of such candidate words for finding candidate words having acceptable misspelling matches and those with nonacceptable misspelling matches; determining characters forming an ending portion, if any, in each of individual ones of the candidate words; utilizing the characters of at least the ending portion of the query word to select, from among other suffixes, a representation of characters forming at least one acceptable suffix for the candidate words; and comparing, character by character, the characters of said at least one acceptable suffix with the characters in the ending portion of each of individual ones of the candidate words for finding candidate words having acceptable ending portions; the first and second recited steps of comparing thereby locating the candidate words which are both an acceptable misspelling match and an acceptable inflection of the query word. - View Dependent Claims (43, 44)
-
Specification