×

Method and system for automatically extracting new word

  • US 7,478,036 B2
  • Filed: 08/30/2001
  • Issued: 01/13/2009
  • Est. Priority Date: 08/30/2000
  • Status: Active Grant
First Claim
Patent Images

1. A method of extracting new words automatically, said method comprising the steps of:

  • segmenting a cleaned corpus in a domain to form a segmented corpus;

    splitting the segmented corpus to form sub strings, and counting the occurrences of each sub string appearing in the corpus; and

    filtering out false candidates to output new words, wherein the new words are words not contained in a base vocabulary;

    wherein the segmenting and the splitting is not dependent upon word boundaries;

    wherein new words are determined based upon the domain of the cleaned corpus;

    wherein the step of splitting and counting is implemented using a GAST (general atom suffix tree) contained in a reduced memory space;

    wherein the GAST is implemented by limiting length of character sub strings.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×