Automatic clustering of tokens from a corpus for grammar acquisition
First Claim
Patent Images
1. A system that recognizes patterns, the system comprising:
- a first module configured to control a processor to generate frequency vectors for each non-context token in a corpus based upon counted occurrences of a predetermined relationship of the non-context tokens to context tokens;
a second module configured to control the processor to cluster the non-context tokens into a cluster tree based upon the frequency vectors according to a lexical correlation among the non-context tokens; and
a third module configured to control the processor to use the cluster tree for pattern recognition.
4 Assignments
0 Petitions
Accused Products
Abstract
A system for recognizing patterns is disclosed. Grammar learning from a corpus includes, for the other non-context words, generating frequency vectors for each non-context token in a corpus based upon counted occurrences of a predetermined relationship of the non-context tokens to identified context tokens. Clusters are grown from the frequency vectors according to a lexical correlation or a cluster tree among the non-context tokens. The cluster tree is used for pattern recognition.
16 Citations
12 Claims
-
1. A system that recognizes patterns, the system comprising:
-
a first module configured to control a processor to generate frequency vectors for each non-context token in a corpus based upon counted occurrences of a predetermined relationship of the non-context tokens to context tokens; a second module configured to control the processor to cluster the non-context tokens into a cluster tree based upon the frequency vectors according to a lexical correlation among the non-context tokens; and a third module configured to control the processor to use the cluster tree for pattern recognition.
-
-
2. A system that performs grammar learning from a corpus, the system comprising:
-
a first module configured to control a processor to generate frequency vectors for each non-context token in a corpus based upon counted occurrences of a predetermined relationship of the non-context tokens to context tokens; a second module configured to control the processor to cluster the non-context tokens based upon the frequency vectors according to a lexical correlation among the non-context tokens; and a third module configured to control the processor to use a cluster tree for pattern recognition. - View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
Specification