×

Document-specific gazetteers for named entity recognition

  • US 9,836,453 B2
  • Filed: 08/27/2015
  • Issued: 12/05/2017
  • Est. Priority Date: 08/27/2015
  • Status: Active Grant
First Claim
Patent Images

1. An entity recognition method comprising:

  • providing a named entity recognition model which has been trained on features extracted from training samples tagged with document-level entity tags, each training sample comprising at least one text sequence;

    receiving a text document to be labeled, the text document being tagged with at least one document-level entity tag;

    generating a document-specific gazetteer based on the at least one document-level entity tag, the document-specific gazetteer including a set of entries, one entry for each of a set of entity names;

    for a text sequence of the text document, extracting features for tokens of the text sequence, the features including document-specific features for tokens matching at least a part of the entity name of one of the gazetteer entries, the document-specific features comprising at least 12 document-specific features;

    predicting entity labels for tokens in the document text sequence with the named entity recognition model, based on the extracted features, andwherein at least one of the generating, extracting, and predicting is performed with a processor.

View all claims
  • 5 Assignments
Timeline View
Assignment View
    ×
    ×