Generating a data structure for information retrieval
First Claim
Patent Images
1. A computer system comprising:
- a computer processor configured to store documents in a database;
a cluster subsystem configured to convert documents in a database into vectors;
a construction subsystem configured to construct a hierarchical structure for the vectors by randomly assigning the vectors to nodes;
a comparison subsystem configured to generate for each one of a plurality documents in the database a patch comprising a list of the documents in the database most similar to the respective one of a plurality of documents in the database;
a confidence subsystem configured to generate self-confidence values for each of the generated patches such that the generated self-confidence values comprise the proportion of documents of a first one of the generated patches that are also in a second one of the generated patches,the confidence subsystem being configured to use weighted self-confidence values to compute relative self-confidence values for each of the generated patches;
a cluster estimation subsystem configured to determine best size of a cluster of each of the generated patches, anda graphical subsystem for displaying the generated patches.
0 Assignments
0 Petitions
Accused Products
Abstract
A computer system for generating data structures for information retrieval of documents stored in a database. The computer system includes: a neighborhood patch generation system for defining patch of nodes having predetermined similarities in a hierarchy structure. The neighborhood patch generation subsystem includes a hierarchy generation subsystem for generating a hierarchy structure upon the document-keyword vectors and a patch definition subsystem. The computer system also comprises a cluster estimation subsystem for generating cluster data of the document-keyword vectors using the similarities of the patches.
42 Citations
8 Claims
-
1. A computer system comprising:
-
a computer processor configured to store documents in a database; a cluster subsystem configured to convert documents in a database into vectors; a construction subsystem configured to construct a hierarchical structure for the vectors by randomly assigning the vectors to nodes; a comparison subsystem configured to generate for each one of a plurality documents in the database a patch comprising a list of the documents in the database most similar to the respective one of a plurality of documents in the database; a confidence subsystem configured to generate self-confidence values for each of the generated patches such that the generated self-confidence values comprise the proportion of documents of a first one of the generated patches that are also in a second one of the generated patches, the confidence subsystem being configured to use weighted self-confidence values to compute relative self-confidence values for each of the generated patches; a cluster estimation subsystem configured to determine best size of a cluster of each of the generated patches, and a graphical subsystem for displaying the generated patches. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A graphical user interface system for graphically presenting estimated clusters on a display device in response to a user input query, said graphical user interface system comprising:
-
a database for storing documents; a computer for generating document-keyword vectors for said documents stored in said database and for estimating clusters of documents in response to said user input query; and a display for displaying on screen said estimated clusters together with confidence relations between said clusters and hierarchical information pertaining to cluster size.
-
-
7. A computer system comprising:
-
a neighborhood patch generation subsystem configured to generate groups of nodes having similarities as determined using a search structure, said neighborhood patch generation subsystem including a subsystem configured to generate a hierarchical structure upon said document-keyword vectors; a patch defining subsystem configured to create patch relationships among said nodes with respect to a metric distance between nodes, wherein a size of one of the patches is based on a cost of patch boundary sharpness; a cluster estimation subsystem configured to generate cluster data of said document-keyword vectors using said similarities of patches; and a cluster defining subsystem configured to increase cluster size and reduce the number of clusters of a smallest size. - View Dependent Claims (8)
-
Specification