×

Computer system, method, and program product for generating a data structure for information retrieval, and an associated graphical user interface

  • US 7,428,541 B2
  • Filed: 12/15/2003
  • Issued: 09/23/2008
  • Est. Priority Date: 12/19/2002
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer system for generating data structures for information retrieval of documents stored in a database, said documents being stored as document-keyword vectors generated from a predetermined keyword list, and said document-keyword vectors forming nodes of a hierarchical structure imposed upon said documents, said computer system comprising:

  • a processor having accessed to the database;

    a document-keyword matrix generation subsystem;

    a neighborhood patch generation subsystem for generating groups of nodes having similarities as determined using a search structure, said neighborhood patch generation subsystem including a subsystem for generating a spatial approximation sample hierarchy structure upon said document-keyword vectors and a patch defining subsystem for creating patch relationships among said nodes with respect to a metric distance between nodes;

    a query vector generation subsystem accepting search conditions and query keywords, generating a corresponding query vector, and storing the generated query vector;

    an intra-patch confidence and inter-patch confidence determination subsystem for every element of the database, the spatial approximation sample hierarchy structure computing a neighborhood patch consisting of a list of those database elements most similar to it for computing inter-patch confidence values between patches and intra-patch confidence values;

    a self confidence determining subsystem for (a) computing a list of self confidence values, for every stored patch, (b) computing relative self confidence values, and (c) thereafter using the relative self confidence values to determine a size of a best subset of each patch to serve as a cluster candidate;

    a cluster estimation subsystem for generating cluster data of said document-keyword-vectors using said similarities of patches wherein the cluster estimation subsystem selects said patches depending on-intra-patch confidence values to represent clusters of said document keyword vectors, estimate the sizes of said patches, and generate cluster data of document keyword vectors using similarities of the patches;

    a redundant cluster elimination subsystem for using inner patch confidence values to eliminate redundant cluster candidates; and

    a display subsystem for displaying on screen said estimated clusters together with confidence relations between said clusters and hierarchical information pertaining to cluster size.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×