×

UNSUPERVISED DOCUMENT CLUSTERING USING LATENT SEMANTIC DENSITY ANALYSIS

  • US 20120011124A1
  • Filed: 07/07/2010
  • Published: 01/12/2012
  • Est. Priority Date: 07/07/2010
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for clustering documents, comprising:

  • generating a latent semantic mapping (LSM) space from a collection of a plurality of documents, the LSM space includes a plurality of document vectors, each representing one of the documents in the collection;

    for each of the document vectors considered as a centroid document vector, identifying a group of document vectors in the LSM space that are within a predetermined hypersphere diameter from the centroid document vector, forming a plurality of groups of document vectors, wherein the predetermined hypersphere diameter represents a predetermined closeness measure among the document vectors in the LSM space; and

    designating a group from the plurality of groups as a cluster of document vectors, wherein the designated group contains a maximum number of document vectors among the plurality of groups.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×