×

Scalable system for clustering of large databases

  • US 6,374,251 B1
  • Filed: 03/17/1998
  • Issued: 04/16/2002
  • Est. Priority Date: 03/17/1998
  • Status: Expired due to Term
First Claim
Patent Images

1. A method for clustering data in a database that is stored on a storage medium comprising the steps of:

  • a) obtaining a portion of the data in the database from a storage medium;

    b) clustering data from the portion of data obtained from the database based upon a clustering criteria to produce a clustering model;

    c) compressing at least some of the data contained within the portion of data by evaluating a data compression criteria based on the clustering model and producing sufficient statistics for the data satisfying the compression criteria;

    d) storing the sufficient statistics for the data satisfying the compression criteria separate from the clustering model for use in subsequent refinement of said clustering model;

    e) continuing to obtain portions of data from the database and refining the clustering model that characterizes data in the database from newly sampled data and the stored sufficient statistics for the data satisfying the compression criteria until a specified stopping criteria has been satisfied; and

    f) displaying progress of the characterization of the clustering of the database on a user interface and providing a user controller input for stopping or suspending further building of a database clustering model.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×