Automatic clustering of tokens from a corpus for grammar acquisition
First Claim
Patent Images
1. A machine-readable medium having stored thereon executable instructions that when executed by a processor, cause the processor to:
- generate frequency vectors for each non-context token in a corpus based upon counted occurrences of a predetermined relationship of the non-context tokens to context tokens; and
cluster the non-context tokens into a cluster tree based upon the frequency vectors according to a lexical correlation among the non-context tokens.
4 Assignments
0 Petitions
Accused Products
Abstract
A method of grammar learning from a corpus comprises, for the other non-context words, generating frequency vectors for each non-context token in a corpus based upon counted occurrences of a predetermined relationship of the non-context tokens to identified context tokens. Clusters are grown from the frequency vectors according to a lexical correlation among the non-context tokens.
-
Citations
20 Claims
-
1. A machine-readable medium having stored thereon executable instructions that when executed by a processor, cause the processor to:
-
generate frequency vectors for each non-context token in a corpus based upon counted occurrences of a predetermined relationship of the non-context tokens to context tokens; and
cluster the non-context tokens into a cluster tree based upon the frequency vectors according to a lexical correlation among the non-context tokens.
-
-
2. A method of grammar learning from a corpus, comprising:
-
generating frequency vectors for each non-context token in a corpus based upon counted occurrences of a predetermined relationship of the non-context tokens to context tokens; and
clustering the non-context tokens based upon the frequency vectors according to a lexical correlation among the non-context tokens. - View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A file storing a grammar model of a corpus of speech, created according to a method comprising:
-
generating frequency vectors for each non-context token in a corpus based upon counted occurrences of a predetermined relationship of the non-context tokens to context tokens;
clustering the non-context tokens into a cluster based upon the frequency vectors according to a lexical correlation among the non-context tokens; and
storing the non-context tokens and a representation of the clusters in a file. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
Specification