Word annotation system
First Claim
1. Apparatus for annotating digitally encoded natural language text words, such apparatus comprisinga dictionary database including a plurality of encoded word base forms, wherein a base form is stored together with a first set of data encoding the possible uses or features of words corresponding to the base form, and with a second set of data encoding the synthesis of inflections of the base form,look-up means for identifying a base form of a text word, such look-up means including(i) means for detecting a characteristic inflectional ending occurring in the text word to produce a candidate base form, and(ii) means for determining whether the candidate base form is a word base form in the dictionary data base, and in that event assigning data stored with the base form to the text word, andmeans for assigning a dummy base form and a set of data codes to a word for which the look-up means retrieves no base form from the dictionary database.
12 Assignments
0 Petitions
Accused Products
Abstract
A system for annotating digitally encoded text includes a dictionary of base forms. For each base form, a first set of tags represents possible grammatical and syntactic properties of the word, and may encode inflectional paradigms of the base form, or feature agreement behavior and special processing. If a text word is not found in the dictionary, an inflectional analyzer looks up one or more base forms derived from the word, and if found, and annotates them with their dictionary tags. A morphological analyzer assigns tags to words not retrieved in the dictionary. The morphological analyzer recognizes words formed by prefixation and suffixation, as well as proper nouns, ordinals, idiomatic expressions, and certain classes of character strings. The tagged words of a sentence are then processed to parse the sentence.
164 Citations
14 Claims
-
1. Apparatus for annotating digitally encoded natural language text words, such apparatus comprising
a dictionary database including a plurality of encoded word base forms, wherein a base form is stored together with a first set of data encoding the possible uses or features of words corresponding to the base form, and with a second set of data encoding the synthesis of inflections of the base form, look-up means for identifying a base form of a text word, such look-up means including (i) means for detecting a characteristic inflectional ending occurring in the text word to produce a candidate base form, and (ii) means for determining whether the candidate base form is a word base form in the dictionary data base, and in that event assigning data stored with the base form to the text word, and means for assigning a dummy base form and a set of data codes to a word for which the look-up means retrieves no base form from the dictionary database.
-
4. Apparatus for annotating digitally encoded natural language text words, such apparatus comprising
a dictionary database including a plurality of word records, each record including a set of tags indicative of properties of a word look up means for looking up a text word in the dictionary database and retrieving its set of tags when the text word is identified in the dictionary database, and morphological analyzer means, operative on a text word which is not identified in the dictionary database, for determining a set of tags and dummy base form by inspection of the morphology or context of such word.
Specification