×

Extracting facts from unstructured text

  • US 9,424,524 B2
  • Filed: 12/02/2014
  • Issued: 08/23/2016
  • Est. Priority Date: 12/02/2013
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • receiving, by an entity extraction computer, an electronic document having unstructured text, wherein the electronic document is a text file;

    extracting, by the entity extraction computer, an entity identifier from the unstructured text in the electronic document;

    extracting, by a topic extraction computer, a topic identifier from the unstructured text in the electronic document;

    extracting, by a fact extraction computer, a fact identifier from the unstructured text in the electronic document by comparing text string structures in the unstructured text to a fact template database, wherein the fact template database having stored therein a fact template model identifying keywords pertaining to specific fact identifiers and corresponding keyword weights; and

    associating, by a fact relatedness estimator computer, the entity identifier with the topic identifier and the fact identifier to determine a confidence score indicative of a degree of accuracy of extraction of the fact identifier, wherein the confidence score is based at least in part on a spatial distance between a part of the unstructured text in the electronic document from where the fact identifier was extracted and a part of the unstructured text from where at least one of the topic identifier or the entity identifier was extracted.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×