METHOD AND SYSTEM FOR NATURAL LANGUAGE DICTIONARY GENERATION
First Claim
Patent Images
1. A method of analyzing a text corpus in a natural language, comprising:
- identifying each word token in the text corpus;
applying one or more paradigm rules to each word token in the text corpus;
generating one or more hypotheses for base forms of each word token;
searching for other word inflected forms corresponding to the base form of each word token;
verifying each hypothesis of the one or more hypotheses for each base form of each word token to identify verified hypothesis;
adding grammatical values and inflection paradigms to each base form of each word token for each verified hypothesis; and
obtaining information on its morphological descriptions for each word token with verified hypothesis.
6 Assignments
0 Petitions
Accused Products
Abstract
A method and computer system for analyzing a text corpus in a natural language is provided. An initial morphological description having word inflection rules for various groups of words in the natural language is created by a linguist. A plurality of text corpuses are analyzed to obtain information on the occurrence of a plurality of word forms for each word token in each text corpus. A morphological dictionary which contains information about each base form and word inflection rules for each word token with verified hypothesis is generated.
37 Citations
23 Claims
-
1. A method of analyzing a text corpus in a natural language, comprising:
-
identifying each word token in the text corpus; applying one or more paradigm rules to each word token in the text corpus; generating one or more hypotheses for base forms of each word token; searching for other word inflected forms corresponding to the base form of each word token; verifying each hypothesis of the one or more hypotheses for each base form of each word token to identify verified hypothesis; adding grammatical values and inflection paradigms to each base form of each word token for each verified hypothesis; and obtaining information on its morphological descriptions for each word token with verified hypothesis. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A method of generating a morphological dictionary for a natural language, comprising:
-
creating an initial morphological description having word inflection rules for groups of words in the natural language; analyzing a plurality of text corpuses in the natural language, including; identifying each word token in each text corpus of the natural language; applying one or more paradigm rules to each word token in each text corpus; generating one or more hypotheses for base forms of each word token; searching for other word inflected forms corresponding to the base form of each word token; verifying each hypothesis of the one or more hypotheses for each base form of each word token to identify verified hypothesis; adding grammatical values and inflection paradigms to each base form of each word token for each verified hypothesis; and obtaining information on its morphological descriptions for each word token with verified hypothesis; and adding the base form of each word token with the morphological descriptions to the morphological dictionary for each verified hypothesis. - View Dependent Claims (16, 17, 18, 19)
-
-
20. A computer readable medium comprising instructions for causing a computing system to carry out steps for analyzing a text corpus in a natural language, the steps comprising:
-
identifying each word token in the text corpus; applying one or more paradigm rules to each word token in the text corpus; generating one or more hypotheses for base forms of each word token; searching for other word inflected forms corresponding to each base form of each word token; verifying each hypothesis of the one or more hypotheses for each base form of each word token to identify verified hypothesis; adding grammatical values and inflection paradigms to each base form of each word token for each verified hypothesis; and obtaining information on its morphological descriptions for each word token with verified hypothesis. - View Dependent Claims (21)
-
-
22. A computer readable medium comprising instructions for causing a computing system to carry out steps for of generating a morphological dictionary for a natural language, the steps comprising:
-
analyzing a plurality of text corpuses in the natural language, including; identifying each word token in each text corpus of the natural language; applying one or more paradigm rules to each word token in each text corpus; generating one or more hypotheses for base forms of each word token; searching for other word inflected forms corresponding to each base form of the word token; verifying each hypothesis of the one or more hypotheses for each base form of each word token to identify verified hypothesis; adding grammatical values and inflection paradigms to each base form of each word token for each verified hypothesis; and obtaining information on its morphological descriptions for each word token with verified hypothesis; and adding the base form of each word token with the morphological descriptions to the morphological dictionary for each verified hypothesis. - View Dependent Claims (23)
-
Specification