×

Information retrieval and text mining using distributed latent semantic indexing

  • US 20040220944A1
  • Filed: 05/01/2003
  • Published: 11/04/2004
  • Est. Priority Date: 05/01/2003
  • Status: Active Grant
First Claim
Patent Images

1. A method for processing a collection of data objects for use in information retrieval and data mining operations comprising the steps of:

  • generating a frequency count for each term in each data object in the collection;

    partitioning the collection of data objects into a plurality of sub-collections using the term-by-data object information, wherein each sub-collection is based on the conceptual dependence of the data objects within;

    generating a term-by-data object matrix for each sub-collection;

    decomposing the term-by data object matrix into a reduced singular value representation;

    determining the centroid vectors of each sub-collection;

    finding a predetermined number of terms in each sub-collection closest to centroid vector; and

    , developing a similarity graph network to establish similarity between sub-collections.

View all claims
  • 10 Assignments
Timeline View
Assignment View
    ×
    ×