Automatic clustering of tokens from a corpus for grammar acquisition
First Claim
Patent Images
1. A grammar learning method from a corpus, comprising:
- identifying context tokens within the corpus, for each non-context token in the corpus, counting occurrences of predetermined relationships of the non-context token to a context token, generating frequency vectors for each non-context token based upon the counted occurrences, and clustering non-context tokens based upon the frequency vectors, whereby the clusters of non-context tokens form a grammatical model of the corpus.
5 Assignments
0 Petitions
Accused Products
Abstract
In a method of learning grammar from a corpus, context words are identified from a corpus. For the other non-context words, the method counts the occurrence of predetermined relationships which the context words, and maps the counted occurrences to a multidimensional frequency space. Clusters are grown from the frequency vectors. The clusters represent classes of words; words in the same cluster possess the same lexical significancy and provide an indicator of grammatical structure.
-
Citations
21 Claims
-
1. A grammar learning method from a corpus, comprising:
-
identifying context tokens within the corpus, for each non-context token in the corpus, counting occurrences of predetermined relationships of the non-context token to a context token, generating frequency vectors for each non-context token based upon the counted occurrences, and clustering non-context tokens based upon the frequency vectors, whereby the clusters of non-context tokens form a grammatical model of the corpus. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method of phrase grammar learning from a corpus, comprising:
-
identifying context tokens within the corpus, for each non-context token in the corpus, counting occurrences of a predetermined relationships of a context token, generating frequency vectors for each non-context token based upon the counted occurrences, and clustering non-context tokens based upon the frequency vectors into a cluster tree, whereby the cluster tree forms a grammatical model of the corpus. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A method of phrase grammar learning from a corpus, comprising:
-
identifying context words from a corpus, for each non-context word in the corpus, counting occurrences of the non-context word within a predetermined adjacency of a context word, generating frequency vectors for each non-context word based upon the counted occurrences, clustering non-context words based on the frequency vectors into a cluster tree, and cutting the cluster tree along a cutting line, thereby forming a grammatical model of the corpus. - View Dependent Claims (17, 18, 19, 20, 21)
-
Specification