×

SYSTEM AND METHOD FOR EXTRACTING ENTITIES OF INTEREST FROM TEXT USING N-GRAM MODELS

  • US 20080040298A1
  • Filed: 05/31/2006
  • Published: 02/14/2008
  • Est. Priority Date: 05/31/2006
  • Status: Active Grant
First Claim
Patent Images

1. A method of using at least two n-gram models, at least one of which is based on a training set of entities of interest and at least one of which is based on a training set of entities not of interest, the method comprising:

  • tokenizing a document to produce a string of tokens corresponding to terms within the document;

    for each token, evaluating the token against the n-gram models to determine which model is most likely to be associated with the token;

    identifying tokens corresponding to at least one n-gram model that is of interest; and

    annotating the identified entities by at least one name for said at least one n-gram model.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×