×

Automatically linking documents with relevant structured information

  • US 8,126,892 B2
  • Filed: 02/01/2011
  • Issued: 02/28/2012
  • Est. Priority Date: 09/08/2006
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method of associating unstructured text data from a text document with structured data from a relational database, said method comprising:

  • receiving and parsing, by a computer, said text document to identify terms, each of said terms comprising a noun phrase;

    identifying, by said computer, entities and context information for said entities in a form of entity templates received from said structured data in said relational database, each of said entity templates comprising a rooted tree with each node of said rooted tree labeled with a table, an edge existing between nodes of said rooted tree when tables labeling said nodes have a foreign-key relationship in a relational schema, each tuple labeling a root node of any of said entity templates constituting an entity, and said context information for said entity comprising tuples having a path between said nodes having said foreign-key relationship to said entity;

    finding, by said computer, embeddings corresponding to each of said entities in said text document, said embeddings mapping each of said entities to a nonempty set of segments of said text document, each of said set of segments comprising one or more consecutive sentences in said text document; and

    searching and querying said relational database based on said terms to determine an annotation, said annotation comprising mappings of each of said entities to non-overlapping segments of said text document, said score being based on an inverse document frequency of said terms in each of said segments of said text document and said termssearching and querying said relational database based on said terms to determine an annotation with a maximum score, said annotation comprising mappings of each of said entities in said partitioning to non-overlapping segments of said text document, said maximum score being based on an inverse document frequency of said terms in each of said segments of said text document and said terms associated with said context information associated with said entity embedded in each of said segments; and

    outputting, by said computer, said partitioning of said annotation with said maximum score.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×