×

Entity analysis system

  • US 9,558,456 B2
  • Filed: 10/29/2015
  • Issued: 01/31/2017
  • Est. Priority Date: 08/08/2011
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method of learning related entities, the method comprising:

  • receiving a set of entities, the set of entities including a plurality of entities and each entity in the set of entities relating to a first concept;

    receiving training content that includes textual content that is organized and that includes the plurality of entities of the set of entities; and

    learning additional entities that are related to the first concept by iteratively performing the following steps;

    identifying one or more potential word templates from the training content based on occurrences of one or more words in the training content with an entity of the set of entities, wherein each potential word template is one or more words, and wherein each potential word template is tagged with a part-of-speech tag based on grammatical use of the one or more words in the training content;

    identifying one or more word templates from the one or more potential word templates based on a frequency of occurrence of the one or more potential word templates and based on the part-of-speech tag of the one or more potential word templates compared to part-of-speech tags of word templates of a set of word templates, wherein the one or more identified word templates are added to the set of word templates;

    generating, for each identified word template, a confidence score for the identified word template based on a frequency of occurrence of the identified word template;

    identifying, for each identified word template, one or more part-of-speech tags of the identified word templates;

    adjusting, for each identified word template, the confidence score of the identified word template based on whether the one or more part of speech tags of the identified word template is similar to the part-of-speech tags of word templates of IM set of word templates;

    comparing, for each identified word template, the confidence score of the identified word template to a threshold value; and

    removing the identified word template from the set of word templates when the confidence score of the identified word template is outside the threshold value.

View all claims
  • 5 Assignments
Timeline View
Assignment View
    ×
    ×