Method for verifying spelling of compound words
First Claim
1. A computer method for parsing a compound word composed of a plurality of word components, comprising the steps of:
- storing a dictionary of stored word components and associating with each of said components a flag indicating whether said word component can be an independent word, a prefix of a word, a middle element of a word, a suffix, or a code specifying the type of morphological transformation it can undergo;
inputting an input word stream which includes a compound word which is to be parsed;
selecting all words from the dictionary that match an initial substring of the input word and retaining only the dictionary words which have a component flag indicating that said dictionary word can be a prefix element;
processing all the remaining portions of the input word, selecting all words from the dictionary that match an initial substring of said remaining portion of the input word and retaining only the dictionary words which have a component flag indicating that the word can be a middle element of a word;
processing all the remaining portions of the input word, selecting all words from the dictionary that exactly match said remaining portion of the input word and retaining only the dictionary words which have a component flag indicating that the word can be a word suffix element;
applying morphological rules during the search for middle or suffix elements if no suitable dictionary candidates can be found.
1 Assignment
0 Petitions
Accused Products
Abstract
This invention describes a method for automatically verifying spelling of compound words in many natural languages such as German, Danish, Swedish, Norwegian, Dutch, Icelandic, Afrikaans, Swiss German, etc. The basic technology of looking up words in a dictionary is supplemented by the association of component flags with each word and by the application of powerful tree-scanning techniques that isolate the components of compound words and determine their correctness in isolation and in association with each other. The technique can be used in word processing systems to support spelling verification, to hyphenate text, and to unhyphenate text.
-
Citations
8 Claims
-
1. A computer method for parsing a compound word composed of a plurality of word components, comprising the steps of:
-
storing a dictionary of stored word components and associating with each of said components a flag indicating whether said word component can be an independent word, a prefix of a word, a middle element of a word, a suffix, or a code specifying the type of morphological transformation it can undergo; inputting an input word stream which includes a compound word which is to be parsed; selecting all words from the dictionary that match an initial substring of the input word and retaining only the dictionary words which have a component flag indicating that said dictionary word can be a prefix element; processing all the remaining portions of the input word, selecting all words from the dictionary that match an initial substring of said remaining portion of the input word and retaining only the dictionary words which have a component flag indicating that the word can be a middle element of a word; processing all the remaining portions of the input word, selecting all words from the dictionary that exactly match said remaining portion of the input word and retaining only the dictionary words which have a component flag indicating that the word can be a word suffix element; applying morphological rules during the search for middle or suffix elements if no suitable dictionary candidates can be found. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
Specification