Morphological/phonetic method for ranking word similarities
First Claim
1. A computer method for ranking the similarity of an input word from an input word string, to words stored in a dictionary storage, comprising the steps of:
- reading a first word from the input word string and writing the consonants of the input word in a first storage location and writing the vowels of the input word in a second storage location;
deleting duplicate consonants in the first storage location and deleting duplicate vowels in said second storage location;
arranging said consonants in said first storage location in alphabetical order and arranging said vowels in said second storage location in alphabetical order;
concatenating said alphabetized consonants in said first storage location with said alphabetized vowels in said second storage location to form an input key word;
reading a dictionary word from a dictionary of stored words and writing the consonants of the dictionary word in a third storage location and the vowels of the dictionary word in a fourth storage location;
deleting duplicate consonants in said third storage location and duplicate vowels in said fourth storage location;
arranging the consonants in said third storage location in alphabetical order and arranging the vowels in said fourth storage location in alphabetical order;
concatenating the alphabetized consonants in said third storage location with the alphabetized vowels in said fourth storage location, to form a dictionary key word;
comparing said input key word with said dictionary key word in a first comparison step by counting the number of change operations in said input key word necessary to make said input key word identically match with said dictionary key word, said count being a first scoring factor;
matching in a second step said input key word with said dictionary key word by measuring the length of identical character segments in said input key word and said dictionary key word, to form a second scoring factor;
combining said first scoring factor and said second scoring factor to obtain a score for ranking the degree of similarity of said input word with said dictionary word.
0 Assignments
0 Petitions
Accused Products
Abstract
A computer method is disclosed for ranking word similarities which is applicable to a variety of dictionary applications such as synonym generation, linguistic analysis, document characterization, etc. The method is based upon transforming an input word string into a key word which s invariant for certain types of errors in the input word, such as the doubling of letters, consonant/vowel transpositions, consonant/consonant transpositions. The specific mapping technique is a morphological mapping which generates keys which will have similarities that can be detected during a subsequent ranking procedure. The mapping is defined such that unique consonants of the input word are listed in their original order followed by the unique vowels for the input words, also in their original order. The keys thus generated will be invariant for consonant/vowel transpositions or doubled letters. The utility of the keys is further improved by arranging the consonants in the keys in alphabetical order followed by arranging the vowels in the keys in alphabetical order. The resultant mapping is insensitive to consonant/consonant transpositions, as well as consonant/vowel transpositions and doubled letters. The method then continues by applying a ranking technique which makes use of a compound measure of similarity for ranking the key words.
134 Citations
3 Claims
-
1. A computer method for ranking the similarity of an input word from an input word string, to words stored in a dictionary storage, comprising the steps of:
-
reading a first word from the input word string and writing the consonants of the input word in a first storage location and writing the vowels of the input word in a second storage location; deleting duplicate consonants in the first storage location and deleting duplicate vowels in said second storage location; arranging said consonants in said first storage location in alphabetical order and arranging said vowels in said second storage location in alphabetical order; concatenating said alphabetized consonants in said first storage location with said alphabetized vowels in said second storage location to form an input key word; reading a dictionary word from a dictionary of stored words and writing the consonants of the dictionary word in a third storage location and the vowels of the dictionary word in a fourth storage location; deleting duplicate consonants in said third storage location and duplicate vowels in said fourth storage location; arranging the consonants in said third storage location in alphabetical order and arranging the vowels in said fourth storage location in alphabetical order; concatenating the alphabetized consonants in said third storage location with the alphabetized vowels in said fourth storage location, to form a dictionary key word; comparing said input key word with said dictionary key word in a first comparison step by counting the number of change operations in said input key word necessary to make said input key word identically match with said dictionary key word, said count being a first scoring factor; matching in a second step said input key word with said dictionary key word by measuring the length of identical character segments in said input key word and said dictionary key word, to form a second scoring factor; combining said first scoring factor and said second scoring factor to obtain a score for ranking the degree of similarity of said input word with said dictionary word.
-
-
2. A computer method for ranking the similarity of an input word from an input word string, to words stored in a dictionary storage, comprising the steps of:
-
reading a first word from the input word string and writing the consonants of the input in a first storage location and writing the vowels of the input word in a second storage location; deleting adjacent duplicate consonants in the first storage location and deleting adjacent duplicate vowels in said second storage location; concatenating said consonants in said first storage location with said vowels in said second storage location to form an input key word; reading a dictionary word from a dictionary of stored words and writing the consonants of the dictionary word in said third storage location and writing the vowels of the dictionary word in said fourth storage location; deleting adjacent duplicate consonants in said third storage location and deleting adjacent duplicate vowels in said fourth storage location; concatenating said consonants in said third storage location with said vowels in said fourth storage location to form a dictionary key word; comparing said input key word with said dictionary key word in a first comparison step by counting the number of change operations in said input key word necessary to make said input key word identically match with said dictionary key word, said count being a first scoring factor; matching in a second step said input key word with said dictionary key word by measuring the length of identical character segments in said input key word and said dictionary key word, to form a second scoring factor; combining said first scoring factor and said second scoring factor to obtain a score for ranking the degree of similarity of said input word with said dictionary word.
-
-
3. A computer method for ranking the similarity of an input word from an input word string, to words stored in a dictionary storage, using a combined morphological/phonetic approach comprising the steps of:
-
reading a first word from the input word string and creating an input key word; reading a dictionary word from a dictionary of stored words and creating a dictionary key word; generating a morphological score by combining;
(1) a first scoring factor consisting of the number of change operations required to make said input key word with said dictionary key word, and (2) a second scoring factor generated by measuring the length of identical character segments in said input key word and said dictionary key word;creating an input phonetic key word by replacing the characters of the input word with corresponding phonetic characters expressed in a set of rewrite rules; creating a dictionary phonetic key word by replacing the characters of said dictionary word with corresponding phonetic characters expressed in a set of rewrite rules; generating a phonetic score by combining (1) a first scoring factor consisting of the number of change operations required to make said input phonetic key word with said dictionary phonetic key word, and (2) a second scoring factor generated by measuring the length of identical character segments in said input phonetic key word and said dictionary phonetic key word; selecting the lower of the morphologic or phonetic score as a measure of the distance between said input word and said dictionary word.
-
Specification