×

Identifying the unifying subject of a set of facts

  • US 8,078,573 B2
  • Filed: 11/04/2010
  • Issued: 12/13/2011
  • Est. Priority Date: 05/31/2005
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method of processing a set of documents for generating a facts database, comprising:

  • at a system having one or more processors and memory storing one or more modules to be executed by the one or more processors;

    accessing a source document from a document host;

    extracting one or more facts from the source document, each fact including an attribute-value pair and a list of documents that include the fact;

    identifying a set of linking documents that have one or more links to the source document, wherein a respective link contains anchor text;

    generating a set of candidate labels from the anchor text of the linking documents;

    assigning a score to each candidate label;

    selecting the candidate label with a highest score as a unifying subject of the one or more facts; and

    for the unifying subject, storing in the facts database an information set distinct from the source document, wherein the information set includes the unifying subject, one or more entries corresponding to the one or more facts extracted from the source document, and source document information associating the source document with the information set.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×