Method and system for natural language dictionary generation
3 Assignments
0 Petitions
Accused Products
Abstract
A method and computer system for analyzing a text corpus in a natural language is provided. An initial morphological description having word inflection rules for various groups of words in the natural language is created by a linguist. A plurality of text corpuses are analyzed to obtain information on the occurrence of a plurality of word forms for each word token in each text corpus. A morphological dictionary which contains information about each base form and word inflection rules for each word token with verified hypothesis is generated.
8 Citations
41 Claims
-
1-23. -23. (canceled)
-
24. A computer system to create a morphological electronic dictionary for a natural language, the computer system comprising:
-
a processor; an electronic memory configured with electronic instructions to cause the computer system to perform steps, the electronic instructions including; identify each word token in the text corpus; apply paradigm rules to each word token in the text corpus; generate one or more hypotheses about a part of speech for base forms of each word token; select other word inflected forms corresponding to the base form of each word token; verify each hypothesis of the one or more hypotheses for each base form of each word token based on ratings; add grammatical values and inflection paradigms to each base form of each word token for each verified hypothesis; obtain information about one or more morphological descriptions for each word token with a verified hypothesis; and add the base form of each word token with the morphological descriptions to the electronic morphological dictionary of the natural language for each verified hypothesis. - View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
-
-
37. A computer system to generate a morphological electronic dictionary for a natural language, the computer system comprising:
-
a processor; an electronic memory configured with electronic instructions to cause the computer system to perform steps, the electronic instructions including; create an initial morphological description having word inflection rules for groups of words in the natural language; analyze by the computer system a plurality of text corpuses in the natural language, including; identifying each word token in each text corpus of the natural language; applying one or more paradigm rules to each word token in each text corpus; generating one or more hypotheses about parts of speech for base forms of each word token; searching for other word inflected forms corresponding to the base form of each word token; verifying each hypothesis of the one or more hypotheses for each base form of each word token based on ratings to identify verified hypotheses; adding grammatical values and inflection paradigms to each base form of each word token for each verified hypothesis; and obtaining information about one or more morphological descriptions for each word token with a verified hypothesis; and add the base form of each word token with the morphological descriptions to the morphological electronic dictionary for each verified hypothesis. - View Dependent Claims (38, 39, 40, 41)
-
Specification