×

Method for clustering data items through distance-merging and density-merging techniques

  • US 8,473,215 B2
  • Filed: 04/26/2004
  • Issued: 06/25/2013
  • Est. Priority Date: 04/25/2003
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for identifying clusters of observed data items in a data space comprising:

  • a) defining an N-dimensional normalized lattice comprising a plurality of lattice points used as data reduction points in a data space of said data items, wherein said lattice points have a regular placement and distance relationship to each other;

    b) obtaining a set of n observed data items and parameter values from an information system, wherein the parameter values are associated with the observed data items;

    c) reducing the observed data items to lattice points and lattice point weights by performing steps of;

    identifying one or more data items within a predetermined distance of each lattice point, and calculating lattice point weights by applying a weighting rule that averages the parameter values associated with the observed data items within a predetermined distance of each lattice point;

    d) determining paths between lattice points by performing steps of;

    defining an estimated density surface for comparing lattice points within a predetermined distance, and creating directional associations between lattice points and at least one other lattice point within a predetermined distance by applying an association rule based on said estimated density surface;

    e) determining a lattice point path by performing steps of;

    sequentially connecting, based on said directional associations, a first lattice point and one or more subsequent lattice points within a predetermined distance until a terminal lattice point is reached, wherein said terminal lattice point is not connectable to any other lattice point based on said association rule;

    f) assigning lattice points in the data space to clusters or background noise by performing steps of;

    comparing the terminal lattice points of a said lattice point paths to a density threshold and defining the terminal lattice point as belonging to a new cluster, an existing cluster, or background noise;

    labeling the lattice points in lattice point path as points belonging to said new cluster, existing cluster, or background points based on said comparing;

    repeating said comparing and labeling steps until the remaining lattice points are labeled as belonging to a previously identified cluster or to background noise;

    g) distance merging clusters of lattice points in the data space by performing steps of;

    associating each cluster of lattice points with a root indicating the maximum terminal lattice point for that cluster;

    determining distances between roots, and merging clusters associated with said roots if the distance between the roots is below a distance threshold;

    h) density merging roots of clusters of lattice points in the data space by performing steps of;

    examining the roots of clusters, determining paths along consecutive lattice points to other roots, and merging roots of clusters when the path of consecutive lattice points densities falls below a minimum density threshold;

    repeating said density merging until no pairs of roots meet a density merging criteria;

    i) assigning the observed data items to clusters of lattice points in the data space by performing steps of;

    applying an assignment rule that assigns each observed data item to the closest lattice point within a predetermined distance; and

    assigning each observed data item to the same cluster to which its associated lattice point is assigned; and

    j) displaying the clusters of observed data items to a user graphical format, wherein steps a) through j) are performed on a suitably programmed computer.

View all claims
  • 4 Assignments
Timeline View
Assignment View
    ×
    ×