Automatically linking documents with relevant structured information

US 8,126,892 B2
Filed: 02/01/2011
Issued: 02/28/2012
Est. Priority Date: 09/08/2006
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method of associating unstructured text data from a text document with structured data from a relational database, said method comprising:

receiving and parsing, by a computer, said text document to identify terms, each of said terms comprising a noun phrase;

identifying, by said computer, entities and context information for said entities in a form of entity templates received from said structured data in said relational database, each of said entity templates comprising a rooted tree with each node of said rooted tree labeled with a table, an edge existing between nodes of said rooted tree when tables labeling said nodes have a foreign-key relationship in a relational schema, each tuple labeling a root node of any of said entity templates constituting an entity, and said context information for said entity comprising tuples having a path between said nodes having said foreign-key relationship to said entity;

finding, by said computer, embeddings corresponding to each of said entities in said text document, said embeddings mapping each of said entities to a nonempty set of segments of said text document, each of said set of segments comprising one or more consecutive sentences in said text document; and

searching and querying said relational database based on said terms to determine an annotation, said annotation comprising mappings of each of said entities to non-overlapping segments of said text document, said score being based on an inverse document frequency of said terms in each of said segments of said text document and said termssearching and querying said relational database based on said terms to determine an annotation with a maximum score, said annotation comprising mappings of each of said entities in said partitioning to non-overlapping segments of said text document, said maximum score being based on an inverse document frequency of said terms in each of said segments of said text document and said terms associated with said context information associated with said entity embedded in each of said segments; and

outputting, by said computer, said partitioning of said annotation with said maximum score.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of associating a given text document with relevant structured data is disclosed. The method receives as inputs a text document, and structured data in the form of a relational database. The method then identifies terms in the text document, and searches and queries the structured data using the terms to identify fragments of the structured data that are relevant to the document. Finally, the text document and the identified fragments of structured data are output to a user.

42 Citations

View as Search Results

6 Claims

1. A computer-implemented method of associating unstructured text data from a text document with structured data from a relational database, said method comprising:
- receiving and parsing, by a computer, said text document to identify terms, each of said terms comprising a noun phrase;
  
  identifying, by said computer, entities and context information for said entities in a form of entity templates received from said structured data in said relational database, each of said entity templates comprising a rooted tree with each node of said rooted tree labeled with a table, an edge existing between nodes of said rooted tree when tables labeling said nodes have a foreign-key relationship in a relational schema, each tuple labeling a root node of any of said entity templates constituting an entity, and said context information for said entity comprising tuples having a path between said nodes having said foreign-key relationship to said entity;
  
  finding, by said computer, embeddings corresponding to each of said entities in said text document, said embeddings mapping each of said entities to a nonempty set of segments of said text document, each of said set of segments comprising one or more consecutive sentences in said text document; and
  
  searching and querying said relational database based on said terms to determine an annotation, said annotation comprising mappings of each of said entities to non-overlapping segments of said text document, said score being based on an inverse document frequency of said terms in each of said segments of said text document and said termssearching and querying said relational database based on said terms to determine an annotation with a maximum score, said annotation comprising mappings of each of said entities in said partitioning to non-overlapping segments of said text document, said maximum score being based on an inverse document frequency of said terms in each of said segments of said text document and said terms associated with said context information associated with said entity embedded in each of said segments; and
  
  outputting, by said computer, said partitioning of said annotation with said maximum score.
- View Dependent Claims (2)
- - 2. The method according to claim 1, said annotation being determined by steps comprising:
    - maintaining a context cache comprising containment relationships between said entities and said terms, each containment relationship comprising an entity-term pair, said term of said entity-pair being related to said context information of said entity by said foreign-key relationship;
      
      based on said containment relationships present in said context cache, issuing a query to said relational database based on contents of said context cache to identify further containment relationships between said entities and said terms;
      
      updating said context cache with results from said query;
      
      finding a current best annotation based on said context cache; and
      
      repeatedly issuing queries, updating said context cache, and finding a next current best annotation, until said annotation having said maximum score is found.

3. A method of associating unstructured text data from a text document with structured data from a relational database comprising:
- receiving and parsing, using a computer, said text document to identify terms, each of said terms comprising a noun phrase;
  
  identifying, using said computer, entities and context information for said entities in a form of entity templates received from said structured data in said relational database, each of said entity templates comprising a rooted tree with each node of said rooted tree labeled with a table, an edge existing between nodes of said rooted tree when tables labeling said nodes have a foreign-key relationship in a relational schema, each tuple labeling a root node of any of said entity templates constituting an entity, and said context information for said entity comprising tuples having a path between said nodes having said foreign-key relationship to said entity;
  
  finding, using said computer, embeddings corresponding to each of said entities in said text document, said embeddings mapping each of said entities to a nonempty set of segments of said text document, each of said set of segments comprising one or more consecutive sentences in said text document;
  
  searching and querying said relational database based on said terms to determine an annotation with a maximum score, said annotation comprising mappings of each of said entities to non-overlapping segments of said text document, said maximum score being based on an inverse document frequency of said terms in each of said segments of said text document and said terms associated with said context information associated with said entity embedded in each of said segments; and
  
  outputting, using said computer, said annotation with said maximum score for said text document.
- View Dependent Claims (4)
- - 4. The method according to claim 3, said annotation with said maximum score being determined by steps comprising:
    - maintaining a context cache comprising containment relationships between said entities and said terms, each containment relationship comprising an entity-term pair, said term of said entity-pair being related to said context information of said entity by said foreign-key relationship;
      
      based on said containment relationships present in said context cache, issuing a query to said relational database based on contents of said context cache to identify further containment relationships between said entities and said terms;
      
      updating said context cache with results from said query;
      
      finding a current best annotation based on said context cache; and
      
      repeatedly issuing queries, updating said context cache, and finding a next current best annotation, until said annotation having said maximum score is found.

5. A computer-implemented method of partitioning a text document including unstructured text data into segments with respect to structured data from a relational database, said method comprising:
- receiving and parsing, by a computer, said text document to identify terms, each of said terms comprising a noun phrase;
  
  identifying, by said computer, entities and context information for said entities in a form of entity templates received from said structured data in said relational database, each of said entity templates comprising a rooted tree with each node of said rooted tree labeled with a table, an edge existing between nodes of said rooted tree when tables labeling said nodes have a foreign-key relationship in a relational schema, each tuple labeling a root node of any of said entity templates constituting an entity, and said context information for said entity comprising tuples having a path between said nodes having said foreign-key relationship to said entity;
  
  finding, by said computer, embeddings corresponding to each of said entities in said text document, said embeddings mapping each of said entities to a partitioning of said text document into a nonempty set of segments, each of said set of segments comprising one or more consecutive sentences in said text document;
  
  based on said containment relationships present in said context cache, issuing a query to said relational database based on contents of said context cache to identify further containment relationships between said entities and said terms;
  
  updating said context cache with results from said query;
  
  finding a current best annotation based on said context cache; and
  
  repeatedly issuing queries, updating said context cache, and finding a next current best annotation, until said annotation having said maximum score is found.
- View Dependent Claims (6)
- - 6. The method according to claim 5, said annotation with said maximum score being determined by steps comprising:
    - maintaining a context cache comprising containment relationships between said entities and said terms, each containment relationship comprising an entity-term pair, said term of said entity-pair being related to said context information of said entity by said foreign-key relationship;
      
      associated with said context information associated with said entity embedded in each of said segments.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Chakravarthy, Venkat, Gupta, Himanshu, Mohania, Mukesh K., Roy, Prasan
Primary Examiner(s)
Stevens, Robert

Application Number

US13/018,547
Publication Number

US 20110131216A1
Time in Patent Office

392 Days
Field of Search

707/705, 707/736, 707/802
US Class Current

707/736
CPC Class Codes

G06F 16/24573   using data annotations, e.g...

G06F 16/38   Retrieval characterised by ...

G06F 16/907   Retrieval characterised by ...

Automatically linking documents with relevant structured information

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

42 Citations

6 Claims

Specification

Use Cases

Quick Links

Others

Automatically linking documents with relevant structured information

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

42 Citations

6 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others