×

Extracting facts from unstructured information

  • US 10,354,188 B2
  • Filed: 08/02/2016
  • Issued: 07/16/2019
  • Est. Priority Date: 08/02/2016
  • Status: Active Grant
First Claim
Patent Images

1. A system comprising:

  • a processing device; and

    a computer-readable storage medium storing machine-readable instructions which, when executed by the processing device, cause the processing device to;

    identify a plurality of sentences in an entity-tagged corpus that include at least two tagged entities, wherein the entity-tagged corpus is derived from a collection of information items that include unstructured information comprising the plurality of sentences;

    parse the plurality of sentences to obtain parsed sentences representing parts of individual sentences as parse trees;

    identify a plurality of relations in the parsed sentences, wherein respective relations identify a first argument value associated with a first named entity that corresponds to a subject expressed in a respective parsed sentence, a second argument value associated with a second named entity that corresponds to an object expressed in the respective parsed sentence, and a relation value which reflects a corresponding relationship expressed in the respective parsed sentence, the corresponding relationship being between the first named entity and the second named entity;

    form one or more relation clusters based at least on the identified relations, respective relation clusters grouping together relations associated with a same first argument type expressed in the unstructured information, a same second argument type expressed in the unstructured information, and a same relation value expressed in the unstructured information;

    generate confidence score information for the relations in said one or more relation clusters to provide scored relations, wherein the confidence score information reflects relative confidence that individual relations express factually true relationships between individual subjects and individual objects and the confidence score information is based at least on a parsing confidence reflecting confidence in the parsing of the plurality of sentences to obtain the parse trees;

    output final extracted facts by selecting a subset of the scored relations based at least on the confidence score information; and

    store the final extracted facts in a data store,the final extracted facts in the data store being accessible via a user computing device coupled to a computer network.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×