Method for clustering data items through distance-merging and density-merging techniques
First Claim
Patent Images
1. A computer-implemented method for identifying clusters of observed data items in a data space comprising:
- a) defining an N-dimensional normalized lattice comprising a plurality of lattice points used as data reduction points in a data space of said data items, wherein said lattice points have a regular placement and distance relationship to each other;
b) obtaining a set of n observed data items and parameter values from an information system, wherein the parameter values are associated with the observed data items;
c) reducing the observed data items to lattice points and lattice point weights by performing steps of;
identifying one or more data items within a predetermined distance of each lattice point, and calculating lattice point weights by applying a weighting rule that averages the parameter values associated with the observed data items within a predetermined distance of each lattice point;
d) determining paths between lattice points by performing steps of;
defining an estimated density surface for comparing lattice points within a predetermined distance, and creating directional associations between lattice points and at least one other lattice point within a predetermined distance by applying an association rule based on said estimated density surface;
e) determining a lattice point path by performing steps of;
sequentially connecting, based on said directional associations, a first lattice point and one or more subsequent lattice points within a predetermined distance until a terminal lattice point is reached, wherein said terminal lattice point is not connectable to any other lattice point based on said association rule;
f) assigning lattice points in the data space to clusters or background noise by performing steps of;
comparing the terminal lattice points of a said lattice point paths to a density threshold and defining the terminal lattice point as belonging to a new cluster, an existing cluster, or background noise;
labeling the lattice points in lattice point path as points belonging to said new cluster, existing cluster, or background points based on said comparing;
repeating said comparing and labeling steps until the remaining lattice points are labeled as belonging to a previously identified cluster or to background noise;
g) distance merging clusters of lattice points in the data space by performing steps of;
associating each cluster of lattice points with a root indicating the maximum terminal lattice point for that cluster;
determining distances between roots, and merging clusters associated with said roots if the distance between the roots is below a distance threshold;
h) density merging roots of clusters of lattice points in the data space by performing steps of;
examining the roots of clusters, determining paths along consecutive lattice points to other roots, and merging roots of clusters when the path of consecutive lattice points densities falls below a minimum density threshold;
repeating said density merging until no pairs of roots meet a density merging criteria;
i) assigning the observed data items to clusters of lattice points in the data space by performing steps of;
applying an assignment rule that assigns each observed data item to the closest lattice point within a predetermined distance; and
assigning each observed data item to the same cluster to which its associated lattice point is assigned; and
j) displaying the clusters of observed data items to a user graphical format, wherein steps a) through j) are performed on a suitably programmed computer.
4 Assignments
0 Petitions
Accused Products
Abstract
A method and/or system for analyzing data using population clustering through density based merging.
15 Citations
7 Claims
-
1. A computer-implemented method for identifying clusters of observed data items in a data space comprising:
-
a) defining an N-dimensional normalized lattice comprising a plurality of lattice points used as data reduction points in a data space of said data items, wherein said lattice points have a regular placement and distance relationship to each other; b) obtaining a set of n observed data items and parameter values from an information system, wherein the parameter values are associated with the observed data items; c) reducing the observed data items to lattice points and lattice point weights by performing steps of;
identifying one or more data items within a predetermined distance of each lattice point, and calculating lattice point weights by applying a weighting rule that averages the parameter values associated with the observed data items within a predetermined distance of each lattice point;d) determining paths between lattice points by performing steps of;
defining an estimated density surface for comparing lattice points within a predetermined distance, and creating directional associations between lattice points and at least one other lattice point within a predetermined distance by applying an association rule based on said estimated density surface;e) determining a lattice point path by performing steps of;
sequentially connecting, based on said directional associations, a first lattice point and one or more subsequent lattice points within a predetermined distance until a terminal lattice point is reached, wherein said terminal lattice point is not connectable to any other lattice point based on said association rule;f) assigning lattice points in the data space to clusters or background noise by performing steps of;
comparing the terminal lattice points of a said lattice point paths to a density threshold and defining the terminal lattice point as belonging to a new cluster, an existing cluster, or background noise;
labeling the lattice points in lattice point path as points belonging to said new cluster, existing cluster, or background points based on said comparing;
repeating said comparing and labeling steps until the remaining lattice points are labeled as belonging to a previously identified cluster or to background noise;g) distance merging clusters of lattice points in the data space by performing steps of;
associating each cluster of lattice points with a root indicating the maximum terminal lattice point for that cluster;
determining distances between roots, and merging clusters associated with said roots if the distance between the roots is below a distance threshold;h) density merging roots of clusters of lattice points in the data space by performing steps of;
examining the roots of clusters, determining paths along consecutive lattice points to other roots, and merging roots of clusters when the path of consecutive lattice points densities falls below a minimum density threshold;
repeating said density merging until no pairs of roots meet a density merging criteria;i) assigning the observed data items to clusters of lattice points in the data space by performing steps of;
applying an assignment rule that assigns each observed data item to the closest lattice point within a predetermined distance; and
assigning each observed data item to the same cluster to which its associated lattice point is assigned; andj) displaying the clusters of observed data items to a user graphical format, wherein steps a) through j) are performed on a suitably programmed computer. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
Specification