×

Method and system for data clustering for very large databases

  • US 5,832,182 A
  • Filed: 04/24/1996
  • Issued: 11/03/1998
  • Est. Priority Date: 04/24/1996
  • Status: Expired due to Term
First Claim
Patent Images

1. A method of clustering data, provided by a data source, in a computer processor having a main memory with a limited capacity, comprising the steps of:

  • (a) receiving data points from the data source;

    (b) determining clusters of the data points that are within a selected threshold and determining a clustering feature for each such cluster, the clustering feature comprising the number of data points in the cluster, the linear sum of the data points in the cluster, and the square sum of the data points in the cluster, and storing the clustering feature for each cluster in the main memory; and

    (c) forming a clustering feature tree comprised of leaf nodes including leaf entries and at least one level of nodes joined to the leaf nodes, the leaf entries of the tree comprising the clustering features of the clusters, the next highest nodes in the tree above the leaves comprising nonleaf nodes that are each joined to a selected number of different leaves, the selected number comprising a branch number, each nonleaf node distinguished by identifiers stored in the main memory comprising the clustering features of each leaf to which the nonleaf node is joined and pointers indicating the leaves to which the node is joined, and further comprising, as necessary, higher level nodes joined to the branch number of lower level nodes, each higher level node distinguished by identifiers that are stored to main memory which comprise the clustering features for each lower node to which the higher node is joined and pointers indicating the lower nodes to which the higher node is joined, the tree terminating at a root node.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×