×

System and method for entity extraction from semi-structured text documents

  • US 10,489,439 B2
  • Filed: 04/14/2016
  • Issued: 11/26/2019
  • Est. Priority Date: 04/14/2016
  • Status: Active Grant
First Claim
Patent Images

1. An automated method for extracting entities from a text document comprising:

  • for at least a section of a text document,extracting a first set of entities in predefined classes of entity from the at least a section, the extraction of the first set of entities comprising at least one of a rule-based extraction method and a probabilistic extraction method;

    identifying a location of each of the extracted entities in the at least a section of the document;

    clustering at least a subset of the extracted entities in the first set into clusters, based on the identified locations of the entities in the document;

    identifying complete clusters of entities and incomplete clusters of entities from the clusters, based on correlations observed between sequences of entities in the clusters and a number of the classes of entity within each entity cluster;

    learning patterns for extracting new entities based on the complete clusters; and

    extracting new entities from the incomplete clusters based on the learned patterns,wherein the extracting of the first set of entities, identifying complete clusters, learning patterns, and extracting new entities are performed with a processor device.

View all claims
  • 6 Assignments
Timeline View
Assignment View
    ×
    ×