×

Automatically linking documents with relevant structured information

  • US 7,899,822 B2
  • Filed: 09/08/2006
  • Issued: 03/01/2011
  • Est. Priority Date: 09/08/2006
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method of associating a text document with relevant structured data, said method comprising:

  • receiving, from a user, a text document;

    parsing, by a computer, said text document to identify a set of terms;

    receiving, from said user, an entity template that provides a set of entities stored in a relational database, said entity template corresponding to a rooted tree comprising nodes, including a root node and other nodes, and an edge, wherein;

    an entity comprises information held in said relational database,said edge connects two nodes only if said two nodes have a foreign-key relationship in a schema of said relational database,said root node is associated with a pivot table, according to said schema, and each row of said pivot table identifies an entity, andsaid other nodes are associated with context tables, according to said schema, and each row of each of said context tables, consisting of associated context information, has a path via one or more edges to an entity of said pivot table associated with said root node; and

    identifying, by said computer, all context tables having at least one of said terms from said set of terms of said text document;

    identifying, by said computer, all terms of said set of terms found in said context tables associated with each of said entities;

    weighting, by said computer, each term of said set of terms based on an inverse document frequency of said text document;

    scoring, by said computer, an annotation of said text document, wherein;

    a sequence of one or more sentences of said text document comprises a segment;

    each segment corresponds to a given entity of said set of entities and is called an annotation; and

    a score of said annotation, corresponding to said given entity, is based on a sum of terms of said set of terms found in said text document, of products of a number of times each term of said set of terms appears in said each segment multiplied by a weight of said each term;

    identifying, by said computer, a maximal annotation score from all annotation scores for said set of entities; and

    outputting, by said computer, said annotation corresponding to said maximal annotation score to said user.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×