×

Unsupervised identification of nonlinear data cluster in multidimensional data

  • US 6,226,408 B1
  • Filed: 01/29/1999
  • Issued: 05/01/2001
  • Est. Priority Date: 01/29/1999
  • Status: Expired due to Term
First Claim
Patent Images

1. A computer implemented method for unsupervised identification of associated sets of data items in a collection of data items without using labels encoded in the data items, the method comprising the operations of:

  • representing each data item by an input vector having a plurality of components representing portions of the data of the data item;

    quantizing the plurality of input vectors by associating each input vector with a closest one of a number of cluster centers, the number of cluster centers less than the plurality of data items, each of the input vectors contributing to each cluster center;

    linking the cluster centers with edges to form a graph, each edge between two cluster centers weighted according to a density of the input vectors between the two cluster centers;

    encoding each input vector as an encoded vector having a coded vector component for each cluster center, each vector component determined as a function of a distance between the input vector and the respective cluster center, and a distance between the respective cluster center and a cluster center nearest the input vector;

    repeating the operations of quantizing, linking, and encoding using the encoded vectors as the input vectors until a termination condition is satisfied; and

    for each encoded vector remaining after the termination condition is satisfied, labeling the data item associated with the encoded vector with a label associated with nearest cluster center.

View all claims
  • 4 Assignments
Timeline View
Assignment View
    ×
    ×