×

Automatic clustering of tokens from a corpus for grammar acquisition

  • US 6,751,584 B2
  • Filed: 07/26/2001
  • Issued: 06/15/2004
  • Est. Priority Date: 12/07/1998
  • Status: Expired due to Term
First Claim
Patent Images

1. A method of grammar learning from a corpus, comprising:

  • identifying context tokens within the corpus, for each non-context token in the corpus, counting occurrences of a predetermined relationship of the non-context token to the context tokens, generating frequency vectors for each non-context token based upon the counted occurrences, clustering non-context tokens based upon the frequency vectors into a cluster, and writing a representation of the cluster to a file, wherein the cluster indicates a lexical correlation among the clustered non-context tokens.

View all claims
  • 4 Assignments
Timeline View
Assignment View
    ×
    ×