Computer system, method, and program product for generating a data structure for information retrieval, and an associated graphical user interface
First Claim
1. A computer system for generating data structures for information retrieval of documents stored in a database, said documents being stored as document-keyword vectors generated from a predetermined keyword list, and said document-keyword vectors forming nodes of a hierarchical structure imposed upon said documents, said computer system comprising:
- a processor having accessed to the database;
a document-keyword matrix generation subsystem;
a neighborhood patch generation subsystem for generating groups of nodes having similarities as determined using a search structure, said neighborhood patch generation subsystem including a subsystem for generating a spatial approximation sample hierarchy structure upon said document-keyword vectors and a patch defining subsystem for creating patch relationships among said nodes with respect to a metric distance between nodes;
a query vector generation subsystem accepting search conditions and query keywords, generating a corresponding query vector, and storing the generated query vector;
an intra-patch confidence and inter-patch confidence determination subsystem for every element of the database, the spatial approximation sample hierarchy structure computing a neighborhood patch consisting of a list of those database elements most similar to it for computing inter-patch confidence values between patches and intra-patch confidence values;
a self confidence determining subsystem for (a) computing a list of self confidence values, for every stored patch, (b) computing relative self confidence values, and (c) thereafter using the relative self confidence values to determine a size of a best subset of each patch to serve as a cluster candidate;
a cluster estimation subsystem for generating cluster data of said document-keyword-vectors using said similarities of patches wherein the cluster estimation subsystem selects said patches depending on-intra-patch confidence values to represent clusters of said document keyword vectors, estimate the sizes of said patches, and generate cluster data of document keyword vectors using similarities of the patches;
a redundant cluster elimination subsystem for using inner patch confidence values to eliminate redundant cluster candidates; and
a display subsystem for displaying on screen said estimated clusters together with confidence relations between said clusters and hierarchical information pertaining to cluster size.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer system for generating data structures for information retrieval of documents stored in a database. The computer system includes: a neighborhood patch generation system for defining patch of nodes having predetermined similarities in a hierarchy structure. The neighborhood patch generation subsystem includes a hierarchy generation subsystem for generating a hierarchy structure upon the document-keyword vectors and a patch definition subsystem. The computer system also comprises a cluster estimation subsystem for generating cluster data of the document-keyword vectors using the similarities of patches.
299 Citations
7 Claims
-
1. A computer system for generating data structures for information retrieval of documents stored in a database, said documents being stored as document-keyword vectors generated from a predetermined keyword list, and said document-keyword vectors forming nodes of a hierarchical structure imposed upon said documents, said computer system comprising:
-
a processor having accessed to the database; a document-keyword matrix generation subsystem; a neighborhood patch generation subsystem for generating groups of nodes having similarities as determined using a search structure, said neighborhood patch generation subsystem including a subsystem for generating a spatial approximation sample hierarchy structure upon said document-keyword vectors and a patch defining subsystem for creating patch relationships among said nodes with respect to a metric distance between nodes; a query vector generation subsystem accepting search conditions and query keywords, generating a corresponding query vector, and storing the generated query vector; an intra-patch confidence and inter-patch confidence determination subsystem for every element of the database, the spatial approximation sample hierarchy structure computing a neighborhood patch consisting of a list of those database elements most similar to it for computing inter-patch confidence values between patches and intra-patch confidence values; a self confidence determining subsystem for (a) computing a list of self confidence values, for every stored patch, (b) computing relative self confidence values, and (c) thereafter using the relative self confidence values to determine a size of a best subset of each patch to serve as a cluster candidate; a cluster estimation subsystem for generating cluster data of said document-keyword-vectors using said similarities of patches wherein the cluster estimation subsystem selects said patches depending on-intra-patch confidence values to represent clusters of said document keyword vectors, estimate the sizes of said patches, and generate cluster data of document keyword vectors using similarities of the patches; a redundant cluster elimination subsystem for using inner patch confidence values to eliminate redundant cluster candidates; and a display subsystem for displaying on screen said estimated clusters together with confidence relations between said clusters and hierarchical information pertaining to cluster size. - View Dependent Claims (2, 3)
-
-
4. A method for generating data structures for information retrieval of documents stored in a database, said documents being stored as document-keyword vectors generated from a predetermined keyword list, and said document-keyword vectors forming nodes of a hierarchical structure imposed upon said documents, said method comprising the step of:
-
generating a hierarchical structure upon said document-keyword vectors and storing hierarchy data in an adequate storage area; generating neighborhood patches of nodes having similarities as determined using levels of the hierarchical structure, and storing said patches in an adequate storage area; generating groups of nodes having similarities as determined using a search structure, including generating a spatial approximation sample hierarchy structure upon said document-keyword vectors and creating patch relationships among said nodes with respect to a metric distance between nodes; determining inter-patch confidence values between patches and intra-patch confidence values; determining an intra-patch confidence and inter-patch confidence for every element of the database, comprising utilizing the spatial approximation sample hierarchy structure to compute a neighborhood patch consisting of a list of those database elements most similar to it and computing inter-patch confidence values between patches and intra-patch confidence values; determining self confidence values to determine a size of a best subset of each patch to serve as a cluster candidate by the steps of (a) computing a list of self confidence values, for every stored patch, (b) computing relative self confidence values, and (c) thereafter using the relative self confidence values to determine the size of a best subset of each patch to serve as a cluster candidate; invoking said hierarchy data and said patches to compute inter-patch confidence values between said patches and intra-patch confidence values, and storing said values as corresponding lists in an adequate storage area; estimating the sizes of said patches, and generating cluster data of document-keyword vectors using similarities of the patches, selecting said patches depending on said inter-patch confidence values and said intra-patch confidence values to represent clusters of said document-keyword vectors; and using inner patch confidence values to eliminate redundant cluster candidates and displaying on screen said estimated clusters together with confidence relations between said clusters and hierarchical information pertaining to cluster size. - View Dependent Claims (5)
-
-
6. A computer-readable storage medium storing a program for making a computer system execute a method for generating data structures for information retrieval of documents stored in a database, said documents being stored as document-keyword vectors generated from a predetermined keyword list, and said document-keyword vectors forming nodes of a hierarchical structure imposed upon said documents, said program making said computer system execute the steps of:
-
accepting search conditions and query keywords, generating a corresponding query vector, and storing the generated query vector; generating a hierarchical structure upon said document-keyword vectors and storing hierarchy data in an adequate storage area; generating neighborhood patches consisting of nodes having similarities as determined using levels of the hierarchical structure, and storing said patch list in an adequate storage area; generating groups of nodes having similarities as determined using a search structure, including generating a spatial approximation sample hierarchy structure upon said document-keyword vectors and creating patch relationships among said nodes with respect to a metric distance between nodes; determining an intra-patch confidence and inter-patch confidence for every element of the database, comprising utilizing the spatial approximation sample hierarchy structure to compute a neighborhood patch consisting of a list of those database elements most similar to it and computing inter-patch confidence values between patches and inter-patch confidence values; determining self confidence values to determine a size of a best subset of each patch to serve as a cluster candidate by the steps of (a) computing a list of self confidence values, for every stored patch, (b) computing relative self confidence values, and (c) thereafter using the relative self confidence values to determine the size of a best subset of each patch to serve as a cluster candidate; invoking said hierarchy data and said patches to compute inter-patch confidence values between said patches and intra-patch confidence values, and storing said values as corresponding lists in an adequate storage area; selecting said patches depending on said inter-patch confidence values and said intra-patch confidence values to represent clusters of said document-keyword vectors; using inner patch confidence values to eliminate redundant cluster candidates; and displaying on screen said estimated clusters together with confidence relations between said clusters and hierarchical information pertaining to cluster size. - View Dependent Claims (7)
-
Specification