×

Identifying the unifying subject of a set of facts

  • US 7,831,545 B1
  • Filed: 05/31/2005
  • Issued: 11/09/2010
  • Est. Priority Date: 05/31/2005
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method of processing a set of documents for generating a facts database, comprising:

  • at a system having one or more processors and memory storing one or more modules to be executed by the one or more processors;

    accessing a source document from a document host;

    extracting one or more facts from the source document, each fact including an attribute-value pair and a list of documents that include the fact;

    identifying a set of linking documents that have one or more links to the source document, wherein a respective link contains anchor text;

    generating a set of candidate labels from the anchor text of the linking documents;

    assigning a score to each candidate label based on a number of linking documents having the candidate label in the anchor text;

    selecting the candidate label with a highest score as a unifying subject of the one or more facts; and

    for the unifying subject, storing in the facts database an object distinct from the source document, wherein the object includes the unifying subject, one or more entries corresponding to the one or more facts extracted from the source document, and information associating the source document with at least one of the facts extracted from the source document and included in the object.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×