Automatically linking documents with relevant structured information

US 7,899,822 B2
Filed: 09/08/2006
Issued: 03/01/2011
Est. Priority Date: 09/08/2006
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method of associating a text document with relevant structured data, said method comprising:

receiving, from a user, a text document;

parsing, by a computer, said text document to identify a set of terms;

receiving, from said user, an entity template that provides a set of entities stored in a relational database, said entity template corresponding to a rooted tree comprising nodes, including a root node and other nodes, and an edge, wherein;

an entity comprises information held in said relational database,said edge connects two nodes only if said two nodes have a foreign-key relationship in a schema of said relational database,said root node is associated with a pivot table, according to said schema, and each row of said pivot table identifies an entity, andsaid other nodes are associated with context tables, according to said schema, and each row of each of said context tables, consisting of associated context information, has a path via one or more edges to an entity of said pivot table associated with said root node; and

identifying, by said computer, all context tables having at least one of said terms from said set of terms of said text document;

identifying, by said computer, all terms of said set of terms found in said context tables associated with each of said entities;

weighting, by said computer, each term of said set of terms based on an inverse document frequency of said text document;

scoring, by said computer, an annotation of said text document, wherein;

a sequence of one or more sentences of said text document comprises a segment;

each segment corresponds to a given entity of said set of entities and is called an annotation; and

a score of said annotation, corresponding to said given entity, is based on a sum of terms of said set of terms found in said text document, of products of a number of times each term of said set of terms appears in said each segment multiplied by a weight of said each term;

identifying, by said computer, a maximal annotation score from all annotation scores for said set of entities; and

outputting, by said computer, said annotation corresponding to said maximal annotation score to said user.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of associating a given text document with relevant structured data is disclosed. The method receives as inputs a text document, and structured data in the form of a relational database. The method then identifies terms in the text document, and searches and queries the structured data using the terms to identify fragments of the structured data that are relevant to the document. Finally, the text document and the identified fragments of structured data are output to a user.

38 Citations

View as Search Results

2 Claims

1. A computer-implemented method of associating a text document with relevant structured data, said method comprising:
- receiving, from a user, a text document;
  
  parsing, by a computer, said text document to identify a set of terms;
  
  receiving, from said user, an entity template that provides a set of entities stored in a relational database, said entity template corresponding to a rooted tree comprising nodes, including a root node and other nodes, and an edge, wherein;
  
  an entity comprises information held in said relational database,said edge connects two nodes only if said two nodes have a foreign-key relationship in a schema of said relational database,said root node is associated with a pivot table, according to said schema, and each row of said pivot table identifies an entity, andsaid other nodes are associated with context tables, according to said schema, and each row of each of said context tables, consisting of associated context information, has a path via one or more edges to an entity of said pivot table associated with said root node; and
  
  identifying, by said computer, all context tables having at least one of said terms from said set of terms of said text document;
  
  identifying, by said computer, all terms of said set of terms found in said context tables associated with each of said entities;
  
  weighting, by said computer, each term of said set of terms based on an inverse document frequency of said text document;
  
  scoring, by said computer, an annotation of said text document, wherein;
  
  a sequence of one or more sentences of said text document comprises a segment;
  
  each segment corresponds to a given entity of said set of entities and is called an annotation; and
  
  a score of said annotation, corresponding to said given entity, is based on a sum of terms of said set of terms found in said text document, of products of a number of times each term of said set of terms appears in said each segment multiplied by a weight of said each term;
  
  identifying, by said computer, a maximal annotation score from all annotation scores for said set of entities; and
  
  outputting, by said computer, said annotation corresponding to said maximal annotation score to said user.

2. A non-transitory computer readable storage medium, readable by a computer, tangibly embodying a program of instructions executable by said computer to perform a method of associating a text document with relevant structured data, said method comprising:
- receiving a text document;
  
  parsing said text document to identify a set of terms;
  
  receiving an entity template that provides a set of entities stored in a relational database, said entity template corresponding to a rooted tree comprising nodes, including a root node and other nodes, and an edge, wherein;
  
  an entity comprises information held in said relational database,said edge connects two nodes only if said two nodes have a foreign-key relationship in a schema of said relational database,said root node is associated with a pivot table, according to said schema, and each row of said pivot table identifies an entity, andsaid other nodes are associated with context tables, according to said schema, and each row of each of said context tables, consisting of associated context information, has a path via one or more edges to an entity of said pivot table associated with said root node; and
  
  identifying all context tables having at least one of said terms from said set of terms of said text document;
  
  identifying all terms of said set of terms found in said context tables associated with each of said entities;
  
  weighting each term of said set of terms based on an inverse document frequency of said text document;
  
  scoring an annotation of said text document, wherein;
  
  a sequence of one or more sentences of said text document comprises a segment;
  
  each segment corresponds to a given entity of said set of entities and is called an annotation; and
  
  a score of said annotation, corresponding to said given entity, is based on a sum of terms of said set of terms found in said text document, of products of a number of times each term of said set of terms appears in said each segment multiplied by a weight of said each term;
  
  identifying a maximal annotation score from all annotation scores for said set of entities; and
  
  outputting said annotation corresponding to said maximal annotation score to said user.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Chakravarthy, Venkat, Roy, Prasan, Mohania, Mukesh K., Gupta, Himanshu
Primary Examiner(s)
Stevens; Robert

Application Number

US11/530,104
Publication Number

US 20080065655A1
Time in Patent Office

1,635 Days
Field of Search

707/4, 707/100, 707/705, 707/999.1, 707/736, 707/802
US Class Current

707/736
CPC Class Codes

G06F 16/24573   using data annotations, e.g...

G06F 16/38   Retrieval characterised by ...

G06F 16/907   Retrieval characterised by ...

Automatically linking documents with relevant structured information

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

38 Citations

2 Claims

Specification

Use Cases

Quick Links

Others

Automatically linking documents with relevant structured information

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

38 Citations

2 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others