×

Document and pattern clustering method and apparatus

  • US 20040230577A1
  • Filed: 03/04/2004
  • Published: 11/18/2004
  • Est. Priority Date: 03/05/2003
  • Status: Active Grant
First Claim
Patent Images

1. A method of clustering documents (or patterns) each having one or plural document (or pattern) segments in an input document (or pattern) set, based on a relation among them, comprising, (a) obtaining a document (or pattern) frequency matrix for the set of input documents (or patterns), based on occurrence frequencies of terms appearing in each document (or pattern);

  • (b) selecting a seed document (or pattern) from remaining documents (or patterns) that are not included in any cluster existing at that moment and constructing a current cluster of the initial state using the seed document (or pattern);

    (c) obtaining the document (or pattern) commonality to the current cluster for each document (or pattern) in the input document (or pattern) set by using information based on the document (or pattern) frequency matrix for the input document (or pattern) set, information based on the document (or pattern) frequency matrix for documents (or patterns) in the current cluster and information based on the common co-occurrence matrix of the current cluster, and making documents (or patterns) having the document commonality higher than a threshold belong temporarily to the current cluster;

    (d) repeating step (c) until the number of documents (or patterns) temporarily belonging to the current cluster becomes the same as that in the previous repetition;

    (e) repeating steps (b) through (d) until a given convergence condition is satisfied; and

    (f) deciding, on the basis of the document (or pattern) commonality of each document (or pattern) to each cluster, a cluster to which each document (or pattern) belongs.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×