×

System and method for extracting entities of interest from text using n-gram models

  • US 7,493,293 B2
  • Filed: 05/31/2006
  • Issued: 02/17/2009
  • Est. Priority Date: 05/31/2006
  • Status: Active Grant
First Claim
Patent Images

1. A method of using at least two n-gram models, at least one of which is based on a training set of entities of interest and at least one of which is based on a training set of entities not of interest, the method comprising:

  • tokenizing a document to produce a string of tokens corresponding to terms within the document;

    for each token, evaluating the token against the n-gram models to determine which model is most likely to be associated with the token;

    identifying tokens corresponding to at least one n-gram model of interest; and

    annotating the identified tokens with at least one name for said at least one n-gram model of interest.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×