×

Scalable system for K-means clustering of large databases

  • US 6,012,058 A
  • Filed: 03/17/1998
  • Issued: 01/04/2000
  • Est. Priority Date: 03/17/1998
  • Status: Expired due to Term
First Claim
Patent Images

1. In a computer data processing system, a method for clustering data in a database comprising the steps of:

  • a) choosing a cluster number K for use in categorizing the data in the database into K different clusters;

    b) accessing data records from a database and bringing a data portion into a rapid access memory;

    c) assigning data records from the data portion to one of the K different clusters and determining a mean of the data records assigned to a given cluster;

    d) summarizing at least some of the data assigned to the clusters, storing a summarization of the data within the rapid access memory;

    e) accessing an additional portion of the data records in the database and bringing said additional portion into the rapid access memory;

    f) again assigning data from the database to a cluster and determining an updated mean from the summarized data and the additional portion of data records; and

    g) evaluating a criteria to determine if further data should be accessed from the database to continue clustering of data from the database.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×