Method and apparatus for learning the morphology of a natural language
First Claim
Patent Images
1. A method for determining a morphology of a language, the method comprising:
- (a) finding an optimal division of each word of a plurality of words to form stems and suffixes, including dividing each word into all possible combinations, each combination having two elements;
for each possible combination, storing a candidate stem and a candidate suffix; and
for each candidate stem and each candidate suffix, assigning a figure of merit value related to number of occurrences of the candidate stem and the candidate suffix in the plurality of words;
selecting a division having optimal figure of merit value as the optimal division;
(b) forming combinations of the stems and the suffixes as signatures; and
(c) producing the morphology of the language using the signatures.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system (200) provide automatic and unsupervised morphological analysis of a corpus of a natural language. An optimal division of each word in the corpus is determined (104) and identified stems and suffixes are combined as signatures (106). Major patterns of stem allomorphy are identified (110) and a morphological analysis is produced (112) for subsequent processing.
-
Citations
6 Claims
-
1. A method for determining a morphology of a language, the method comprising:
-
(a) finding an optimal division of each word of a plurality of words to form stems and suffixes, including dividing each word into all possible combinations, each combination having two elements;
for each possible combination, storing a candidate stem and a candidate suffix; and
for each candidate stem and each candidate suffix, assigning a figure of merit value related to number of occurrences of the candidate stem and the candidate suffix in the plurality of words;
selecting a division having optimal figure of merit value as the optimal division;
(b) forming combinations of the stems and the suffixes as signatures; and
(c) producing the morphology of the language using the signatures.
-
-
2. A method for determining the morphology of a natural language, the method comprising the steps of:
-
(a) receiving a corpus including a plurality of words of the natural language;
(b) identifying a parse of a word including a candidate stem and a candidate suffix;
(c) determining a quality of the parse according to the relation V(Stem/Suffix)=|Stem|* log [Stem]+|Suffix|* log [Suffix];
(d) selecting the parse having best quality according to a predetermined criterion;
(e) repeating steps (b) through (d) for all words of the corpus;
(f) combining stems and suffixes to form signatures; and
(g) producing the morphology of the language using the signatures. - View Dependent Claims (3, 4, 5, 6)
evaluating all possible parses of each word in the corpus.
-
-
4. The method of claim 2 wherein step (f) comprises steps of:
-
(f1) discarding all signatures associated with only one stem; and
(f2) discarding all signatures associated with only one suffix.
-
-
5. The method of claim 2 wherein step (f) comprises the step of:
(f1) forming a list of each stem and all suffixes appearing with the each stem in the corpus.
-
6. The method of claim 5, wherein step (f) further comprises steps of:
-
(f2) establishing a data structure for each signature; and
(f3) storing in the data structure all stems associated with that signature.
-
Specification