Computer system, method, and program product for generating a data structure for information retrieval, and an associated graphical user interface

US 7,428,541 B2
Filed: 12/15/2003
Issued: 09/23/2008
Est. Priority Date: 12/19/2002
Status: Expired due to Fees

First Claim

Patent Images

1. A computer system for generating data structures for information retrieval of documents stored in a database, said documents being stored as document-keyword vectors generated from a predetermined keyword list, and said document-keyword vectors forming nodes of a hierarchical structure imposed upon said documents, said computer system comprising:

a processor having accessed to the database;

a document-keyword matrix generation subsystem;

a neighborhood patch generation subsystem for generating groups of nodes having similarities as determined using a search structure, said neighborhood patch generation subsystem including a subsystem for generating a spatial approximation sample hierarchy structure upon said document-keyword vectors and a patch defining subsystem for creating patch relationships among said nodes with respect to a metric distance between nodes;

a query vector generation subsystem accepting search conditions and query keywords, generating a corresponding query vector, and storing the generated query vector;

an intra-patch confidence and inter-patch confidence determination subsystem for every element of the database, the spatial approximation sample hierarchy structure computing a neighborhood patch consisting of a list of those database elements most similar to it for computing inter-patch confidence values between patches and intra-patch confidence values;

a self confidence determining subsystem for (a) computing a list of self confidence values, for every stored patch, (b) computing relative self confidence values, and (c) thereafter using the relative self confidence values to determine a size of a best subset of each patch to serve as a cluster candidate;

a cluster estimation subsystem for generating cluster data of said document-keyword-vectors using said similarities of patches wherein the cluster estimation subsystem selects said patches depending on-intra-patch confidence values to represent clusters of said document keyword vectors, estimate the sizes of said patches, and generate cluster data of document keyword vectors using similarities of the patches;

a redundant cluster elimination subsystem for using inner patch confidence values to eliminate redundant cluster candidates; and

a display subsystem for displaying on screen said estimated clusters together with confidence relations between said clusters and hierarchical information pertaining to cluster size.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer system for generating data structures for information retrieval of documents stored in a database. The computer system includes: a neighborhood patch generation system for defining patch of nodes having predetermined similarities in a hierarchy structure. The neighborhood patch generation subsystem includes a hierarchy generation subsystem for generating a hierarchy structure upon the document-keyword vectors and a patch definition subsystem. The computer system also comprises a cluster estimation subsystem for generating cluster data of the document-keyword vectors using the similarities of patches.

299 Citations

7 Claims

1. A computer system for generating data structures for information retrieval of documents stored in a database, said documents being stored as document-keyword vectors generated from a predetermined keyword list, and said document-keyword vectors forming nodes of a hierarchical structure imposed upon said documents, said computer system comprising:
- a processor having accessed to the database;
  
  a document-keyword matrix generation subsystem;
  
  a neighborhood patch generation subsystem for generating groups of nodes having similarities as determined using a search structure, said neighborhood patch generation subsystem including a subsystem for generating a spatial approximation sample hierarchy structure upon said document-keyword vectors and a patch defining subsystem for creating patch relationships among said nodes with respect to a metric distance between nodes;
  
  a query vector generation subsystem accepting search conditions and query keywords, generating a corresponding query vector, and storing the generated query vector;
  
  an intra-patch confidence and inter-patch confidence determination subsystem for every element of the database, the spatial approximation sample hierarchy structure computing a neighborhood patch consisting of a list of those database elements most similar to it for computing inter-patch confidence values between patches and intra-patch confidence values;
  
  a self confidence determining subsystem for (a) computing a list of self confidence values, for every stored patch, (b) computing relative self confidence values, and (c) thereafter using the relative self confidence values to determine a size of a best subset of each patch to serve as a cluster candidate;
  
  a cluster estimation subsystem for generating cluster data of said document-keyword-vectors using said similarities of patches wherein the cluster estimation subsystem selects said patches depending on-intra-patch confidence values to represent clusters of said document keyword vectors, estimate the sizes of said patches, and generate cluster data of document keyword vectors using similarities of the patches;
  
  a redundant cluster elimination subsystem for using inner patch confidence values to eliminate redundant cluster candidates; and
  
  a display subsystem for displaying on screen said estimated clusters together with confidence relations between said clusters and hierarchical information pertaining to cluster size.
- View Dependent Claims (2, 3)
- - 2. The computer system of claim 1, wherein said cluster estimation subsystem selects said patches depending on said inner-patch confidence values to represent clusters of said document-keyword vectors.
  - 3. The computer system of claim 1, wherein said cluster estimation subsystem estimates sizes of said clusters depending on said intra-patch confidence values.

4. A method for generating data structures for information retrieval of documents stored in a database, said documents being stored as document-keyword vectors generated from a predetermined keyword list, and said document-keyword vectors forming nodes of a hierarchical structure imposed upon said documents, said method comprising the step of:
- generating a hierarchical structure upon said document-keyword vectors and storing hierarchy data in an adequate storage area;
  
  generating neighborhood patches of nodes having similarities as determined using levels of the hierarchical structure, and storing said patches in an adequate storage area;
  
  generating groups of nodes having similarities as determined using a search structure, including generating a spatial approximation sample hierarchy structure upon said document-keyword vectors and creating patch relationships among said nodes with respect to a metric distance between nodes;
  
  determining inter-patch confidence values between patches and intra-patch confidence values;
  
  determining an intra-patch confidence and inter-patch confidence for every element of the database, comprising utilizing the spatial approximation sample hierarchy structure to compute a neighborhood patch consisting of a list of those database elements most similar to it and computing inter-patch confidence values between patches and intra-patch confidence values;
  
  determining self confidence values to determine a size of a best subset of each patch to serve as a cluster candidate by the steps of (a) computing a list of self confidence values, for every stored patch, (b) computing relative self confidence values, and (c) thereafter using the relative self confidence values to determine the size of a best subset of each patch to serve as a cluster candidate;
  
  invoking said hierarchy data and said patches to compute inter-patch confidence values between said patches and intra-patch confidence values, and storing said values as corresponding lists in an adequate storage area;
  
  estimating the sizes of said patches, and generating cluster data of document-keyword vectors using similarities of the patches, selecting said patches depending on said inter-patch confidence values and said intra-patch confidence values to represent clusters of said document-keyword vectors; and
  
  using inner patch confidence values to eliminate redundant cluster candidates anddisplaying on screen said estimated clusters together with confidence relations between said clusters and hierarchical information pertaining to cluster size.
- View Dependent Claims (5)
- - 5. The method according to claim 4 further comprising the step of estimating sizes of said clusters depending on said intra-patch confidence values.

6. A computer-readable storage medium storing a program for making a computer system execute a method for generating data structures for information retrieval of documents stored in a database, said documents being stored as document-keyword vectors generated from a predetermined keyword list, and said document-keyword vectors forming nodes of a hierarchical structure imposed upon said documents, said program making said computer system execute the steps of:
- accepting search conditions and query keywords, generating a corresponding query vector, and storing the generated query vector;
  
  generating a hierarchical structure upon said document-keyword vectors and storing hierarchy data in an adequate storage area;
  
  generating neighborhood patches consisting of nodes having similarities as determined using levels of the hierarchical structure, and storing said patch list in an adequate storage area;
  
  generating groups of nodes having similarities as determined using a search structure, including generating a spatial approximation sample hierarchy structure upon said document-keyword vectors and creating patch relationships among said nodes with respect to a metric distance between nodes;
  
  determining an intra-patch confidence and inter-patch confidence for every element of the database, comprising utilizing the spatial approximation sample hierarchy structure to compute a neighborhood patch consisting of a list of those database elements most similar to it and computing inter-patch confidence values between patches and inter-patch confidence values;
  
  determining self confidence values to determine a size of a best subset of each patch to serve as a cluster candidate by the steps of (a) computing a list of self confidence values, for every stored patch, (b) computing relative self confidence values, and (c) thereafter using the relative self confidence values to determine the size of a best subset of each patch to serve as a cluster candidate;
  
  invoking said hierarchy data and said patches to compute inter-patch confidence values between said patches and intra-patch confidence values, and storing said values as corresponding lists in an adequate storage area;
  
  selecting said patches depending on said inter-patch confidence values and said intra-patch confidence values to represent clusters of said document-keyword vectors;
  
  using inner patch confidence values to eliminate redundant cluster candidates; and
  
  displaying on screen said estimated clusters together with confidence relations between said clusters and hierarchical information pertaining to cluster size.
- View Dependent Claims (7)
- - 7. The computer readable storage medium according to claim 6, further comprising the step of estimating sizes of said clusters depending on said intra-patch confidence values.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Houle, Michael Edward
Primary Examiner(s)
Le; Miranda

Application Number

US10/736,273
Publication Number

US 20040139067A1
Time in Patent Office

1,744 Days
Field of Search

707 1- 10, 707100-102, 707/104.1, 707200-201, 704/245, 706/46, 708/5
US Class Current

1/1
CPC Class Codes

G06F 16/35   Clustering; Classification

G06F 16/93   Document management systems

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99943   Generating database or data...

Computer system, method, and program product for generating a data structure for information retrieval, and an associated graphical user interface

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

299 Citations

7 Claims

Specification

Solutions

Use Cases

Quick Links

Computer system, method, and program product for generating a data structure for information retrieval, and an associated graphical user interface

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

299 Citations

7 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links