×

Model selection for cluster data analysis

  • US 7,890,445 B2
  • Filed: 10/30/2007
  • Issued: 02/15/2011
  • Est. Priority Date: 05/18/2001
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer implemented method for clustering data comprising a plurality of letters within text or speech, the method comprising:

  • (a) inputting the data into a computer system having a memory and a processor for executing a clustering algorithm;

    (b) selecting a clustering algorithm based on a dissimilarity measure between pairs of the letters'"'"' principal components;

    (c) randomly assigning class labels to the letters;

    (d) defining a plurality of clusters of letters within each labeled class;

    (e) measuring dissimilarity between each cluster of letters by measuring a residual of a fit of one cluster onto another cluster, wherein the residual fit comprises using a fit that is invariant with respect to affine transformations, wherein the affine transformations comprise a combination of translation, scaling and rotation;

    (f) reassigning letters to the labeled class with the most similar cluster;

    (g) repeating steps (d) through (f) until assignment of letters to the labeled classes remains constant; and

    (h) displaying a graph showing the letters clustered into the labeled classes.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×