×

CJK NAME DETECTION

  • US 20100306139A1
  • Filed: 12/06/2007
  • Published: 12/02/2010
  • Est. Priority Date: 12/06/2007
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • generating a raw name detection model using a collection of family names and an annotated corpus including a collection of n-grams, each n-gram having a corresponding probability of occurring as a name in the annotated corpus;

    applying the raw name detection model to a collection of semi-structured data to form annotated semi-structured data, the annotated semi-structured data identifying n-grams identifying names and n-grams not identifying names;

    applying the raw name detection model to a large unannotated corpus to form a large annotated corpus data identifying n-grams of the large unannotated corpus identifying names and n-grams not identifying names; and

    generating a name detection model including;

    deriving a name model using the annotated semi-structured data identifying names and the large annotated corpus data identifying names,deriving a not-name model using the semi-structured data not identifying names, andderiving a language model using the large annotated corpus.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×