METHOD AND SYSTEM FOR GENERATING NEW ENTRIES IN NATURAL LANGUAGE DICTIONARY
2 Assignments
0 Petitions
Accused Products
Abstract
A method and computer system for analyzing a text corpus in a natural language is provided. An initial morphological description having word inflection rules for various groups of words in the natural language is created by a linguist. A plurality of text corpuses are analyzed to obtain information on the occurrence of a plurality of word forms for each word token in each text corpus. A morphological dictionary which contains information about each base form and word inflection rules for each word token with verified hypothesis is generated.
-
Citations
47 Claims
-
1-23. -23. (canceled)
-
24. A computer system to create a new entry in morphological electronic dictionary for a natural language, the computer system comprising:
-
a processor; and an electronic memory configured with electronic instructions to cause the computer system to perform steps, the electronic instructions including; identifying a word token in a text corpus; applying one or more morphological paradigm rules to the word token to generate one or more hypotheses about a base form of the word token; generating other word forms for the base form, where the other word forms correspond to the generated one or more hypotheses; verifying at least one hypothesis of the one or more hypotheses for at least one of the other word forms of the word token; estimating the at least one hypothesis to get rating scores by checking in the text corpus for the generated other word forms; identifying a best verified hypothesis, wherein the best verified hypothesis is a verified hypothesis with the highest rating scores; adding an inflection paradigm and a grammatical value to the base form of the word token based on the best verified hypothesis; and adding a new entry in a morphological electronic dictionary, the new entry comprising the base form of the word token according to the best verified hypothesis. - View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35)
-
-
36. A method for creating a new entry in morphological electronic dictionary for a natural language, using a computer system comprising:
-
one or more processors; and an electronic memory; the method comprising; identifying a word token in a text corpus; applying one or more morphological paradigm rules to the word token to generate one or more hypotheses about a base form of the word token; generating other word forms for the base form, where the other word forms correspond to the generated one or more hypotheses; verifying at least one hypothesis of the one or more hypotheses for at least one of the other word forms of the word token; estimating the at least one hypothesis to get rating scores by checking in the text corpus for the generated other word forms; identifying a best verified hypothesis, wherein the best verified hypothesis is a verified hypothesis with the highest rating scores; adding an inflection paradigm and a grammatical value to the base form of the word token based on the best verified hypothesis; and adding a new entry in a morphological electronic dictionary, the new entry comprising the base form of the word token according to the best verified hypothesis. - View Dependent Claims (37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47)
-
Specification