System and method for using text analytics to identify a set of related documents from a source document

US 9,495,349 B2
Filed: 11/17/2005
Issued: 11/15/2016
Est. Priority Date: 11/17/2005
Status: Active Grant

First Claim

Patent Images

1. A computer system for processing documents, the system comprising:

a memory including a document processing system stored thereon, anda processor in communication with the memory,wherein the processor executes the document processing system stored in the memory, the document processing system including;

a textual analytics system that analyzes unstructured data contained in a source document to generate a set of structured information about the source document and extracts the set of structured information about the source document;

a compare system that identifies and aggregates a set of documents related to the source document by comparing the set of structured information with metadata stored in a metadata database, wherein the metadata stored in the metadata database is indexed from a set of technical reference publications,wherein a technical reference publication is identified as related to the source document and added to the set of documents related to the source document when the set of structured information extracted from the source document matches an associated metadata of the technical reference publication;

an annotation system for annotating the source document,wherein the annotation system annotates the source document with the structured information extracted from the source document, and wherein the annotation system further annotates the source document with metadata associated with each technical reference publication in the set of related documents; and

a ranking system for ranking the metadata in the annotated source document,wherein in a case in which more than one technical reference in the set of related documents is associated with a piece of metadata, the piece of metadata is assigned a higher rank of importance relative to a piece of metadata which is associated with fewer technical references in the set of related documents.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for processing a document to generate a set of related documents. A system is provided that includes a textual analytics system that analyzes unstructured data contained in a source document and extracts a set of structured information about the source document; and a compare system that identifies a set of related documents by comparing the set of structured information with metadata indexed from a set of publications.

54 Citations

View as Search Results

10 Claims

1. A computer system for processing documents, the system comprising:
- a memory including a document processing system stored thereon, anda processor in communication with the memory,wherein the processor executes the document processing system stored in the memory, the document processing system including;
  
  a textual analytics system that analyzes unstructured data contained in a source document to generate a set of structured information about the source document and extracts the set of structured information about the source document;
  
  a compare system that identifies and aggregates a set of documents related to the source document by comparing the set of structured information with metadata stored in a metadata database, wherein the metadata stored in the metadata database is indexed from a set of technical reference publications,wherein a technical reference publication is identified as related to the source document and added to the set of documents related to the source document when the set of structured information extracted from the source document matches an associated metadata of the technical reference publication;
  
  an annotation system for annotating the source document,wherein the annotation system annotates the source document with the structured information extracted from the source document, and wherein the annotation system further annotates the source document with metadata associated with each technical reference publication in the set of related documents; and
  
  a ranking system for ranking the metadata in the annotated source document,wherein in a case in which more than one technical reference in the set of related documents is associated with a piece of metadata, the piece of metadata is assigned a higher rank of importance relative to a piece of metadata which is associated with fewer technical references in the set of related documents.
- View Dependent Claims (2, 3, 4)
- - 2. The computer system for processing documents of claim 1, wherein the set of structured information further comprises key words associated with a technology field.
  - 3. The computer system for processing documents of claim 1, wherein the unstructured data comprises one of natural language documents, speech, audio, still images, and video.
  - 4. The computer system for processing documents of claim 1, further comprising:
    - a database of annotated documents; and
      
      a data mining system for mining the database of annotated documents.

5. A non-transitory computer readable storage medium storing computer instructions, which when executed, enable a computer hardware system to process a content source, the processing comprising:
- analyzing unstructured data contained in the content source to generate a set of structured information about the content source;
  
  extracting the set of structured information about the content source;
  
  identifying and aggregating a set of documents related to the content source by comparing the set of structured information with metadata stored in a metadata database, wherein the metadata stored in the metadata database is indexed from a set of technical reference publications,wherein a technical reference publication is identified as related to the content source and added to the set of documents related to the content source when the set of structured information extracted from the content source matches an associated metadata of the technical reference publication document;
  
  annotating the content source with the structured information extracted from the content source and with metadata associated with each technical reference publication in the set of related documents; and
  
  ranking the metadata in the annotated content source,wherein in a case in which more than one technical reference publication in the set of related documents is associated with a piece of metadata, the piece of metadata is assigned a higher rank of importance relative to a piece of metadata which is associated with fewer technical reference publications in the set of related documents.
- View Dependent Claims (6, 7, 8)
- - 6. The non-transitory computer readable storage medium of claim 5, wherein the set of structured information further comprises key words associated with a technology field.
  - 7. The non-transitory computer readable storage medium of claim 5, wherein the unstructured data comprises one of:
    - natural language documents, speech, audio, still images, and video.
  - 8. The non-transitory computer readable storage medium of claim 5, the processing further comprising:
    - storing an annotated content source in a database of annotated documents; and
      
      data mining the database of annotated content sources.

9. A method of processing a source document on a computer system, comprising:
- analyzing unstructured data contained in the source document using a processor;
  
  generating a set of structured information about the source document, and storing the set of structured information in a memory;
  
  extracting the set of structured information about the source document;
  
  identifying and aggregating a set of documents related to the source document by comparing the set of structured information with metadata stored in a metadata database, wherein the metadata stored in the metadata database is indexed from a set of technical reference publications,wherein a technical reference publication is identified as related to the source document and added to the set of documents related to the source document when the set of structured information extracted from the source document matches an associated metadata of the technical reference publication, wherein the set of structured information comprises a list of chemical abstract numbers;
  
  annotating the source document with the structured information extracted from the source document, and with metadata associated with each technical reference publication in the set of related documents; and
  
  ranking the metadata in the annotated source document,wherein in a case in which more than one technical reference publication in the set of related documents is associated with a piece of metadata, the piece of metadata is assigned a higher rank of importance relative to a piece of metadata which is associated with fewer technical reference publications in the set of related documents.

10. A method for deploying an application for processing a document on a computer system, comprising:
- providing a computer infrastructure being operable to;
  
  analyze unstructured data contained in the content source using a processor to generate a set of structured information about the content source,extract the set of structured information about the content source and store the set of structured information in a memory;
  
  identify and aggregate a set of documents related to the content source by comparing the set of structured information with metadata stored in a metadata database, wherein the metadata stored in the metadata database is indexed from a set of technical reference publications,wherein a technical reference publication is identified as related to the content source and added to the set of documents related to the content source when the set of structured information extracted from the source document matches an associated metadata of the technical reference publication,annotate the source document with metadata associated with the structured information extracted from the source document, and with metadata associated with each technical reference publication in the set of related documents; and
  
  rank the metadata in the annotated source document,wherein in a case in which more than one technical reference publication in the set of related documents is associated with a piece of metadata, the piece of metadata is assigned a higher rank of importance relative to a piece of metadata which is associated with fewer technical reference publications in the set of related documents.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Angell, Robert L., Boyer, Stephen K., Cooper, James W., Hennessy, Richard A., Kanungo, Tapas, Kreulen, Jeffrey T., Martin, David C., Rhodes, James J., Spangler, W. Scott, Weintraub, Herschel J. R.
Primary Examiner(s)
Vy, Hung T

Application Number

US11/281,291
Publication Number

US 20070112748A1
Time in Patent Office

4,016 Days
Field of Search

707/3, 707/708, 707/736, 707/758
US Class Current

1/1
CPC Class Codes

G06F 40/237 Lexical tools

System and method for using text analytics to identify a set of related documents from a source document

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

54 Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for using text analytics to identify a set of related documents from a source document

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

54 Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links