×

Latent semantic clustering

  • US 7,844,566 B2
  • Filed: 05/11/2006
  • Issued: 11/30/2010
  • Est. Priority Date: 04/26/2005
  • Status: Active Grant
First Claim
Patent Images

1. A computer-based method for automatically identifying clusters of conceptually-related documents in a collection of documents, comprising:

  • (a) generating a document-representation of each document in an abstract mathematical space;

    (b) identifying a plurality of document clusters in the collection of documents based on a conceptual similarity between respective pairs of the document-representations, wherein each document cluster is associated with an exemplary document and a plurality of other documents; and

    (c) identifying a non-intersecting document cluster from among the plurality of document clusters based on (i) a conceptual similarity between the document-representation of the exemplary document and the document-representation of each document in the non-intersecting cluster and (ii) a conceptual dissimilarity between a cluster-representation of the non-intersecting document cluster and a cluster-representation of each other document cluster, wherein step (c) comprises,(c1) identifying a non-intersecting document cluster from among the plurality of document clusters if (i) a conceptual similarity between the document-representation of the exemplary document and the document-representation of each document in the non-intersecting cluster is above a predefined similarity threshold and (ii) a conceptual dissimilarity between a cluster-representation of the non-intersecting document cluster and a cluster-representation of each other document cluster is above a predefined dissimilarity threshold; and

    (d) iteratively adjusting the predefined similarity threshold from a maximum similarity level to a minimum similarity level via a predefined similarity increment;

    (e) iteratively adjusting the predefined dissimilarity threshold from a minimum dissimilarity level to a maximum dissimilarity level via a predefined dissimilarity increment; and

    (f) repeating step (c1) for each similarity level and each dissimilarity level.

View all claims
  • 4 Assignments
Timeline View
Assignment View
    ×
    ×