Population clustering through density-based merging
First Claim
Patent Images
1. A method of identifying clusters of data items, wherein a data item is associated with one or more values, said method using an information processing system with a logic processor comprising:
- defining using a logic processor a plurality of lattice points in a data space able to contain some of said data items of interest, said lattice points being data reduction points, said lattice points spaced according to a lattice rule, wherein said lattice points provide a rules-based ordering of values and storing said plurality of lattice points in a memory of said information processing system;
calculating weights for said lattice points using one or more of said data items according to a weighting rule by retrieving said plurality of said lattice points from said memory and performing said calculating using said processor;
determining a density at said lattice points relating a weight at a lattice point to weights at nearby lattice points using a density function on said processor;
creating, for each of a plurality of said lattice points, a directional association with at least one other of said lattice points using an association rule using said processor and storing said direction association in said memory;
sequentially following directional associations between lattice points starting at a starting lattice point of said lattice points, thereby defining a lattice point path until a terminal lattice point for said lattice point path is reached wherein a terminal lattice point is identified when there is no surrounding point from which a lattice point can be connected using said processor;
comparing a density of said terminal lattice point of said lattice point path to a density threshold thereby determining if said terminal lattice point indicates a new cluster, an existing cluster, or a default background cluster using said processor; and
labeling lattice points in said lattice point path as background points, points belonging to said existing cluster;
or points belonging to said new cluster based on said determining using said processor and storing said labeling in said memory;
repeating said sequentially following, said comparing, and said labeling until a majority of lattice points are labeled using said processor;
distance merging lattice point paths by comparing pairs of terminal lattice points and if a distance between two terminal lattice points is below a distance threshold, merging points in each lattice point path connected to said terminal lattice points into a single cluster, wherein said distance threshold is expressed as a distance between lattice points;
density merging lattice point paths by comparing pairs of terminal lattice points and if there is a path of consecutive lattice points along which values of said lattice point'"'"'s densities do not fall more than a density threshold below the minimum values of the two roots;
repeating said density merging on initial lattice point paths and merged lattice point paths until no pairs of lattice points meet a density merging criteria;
assigning lattice points to a cluster or to background noise;
assigning data points to clusters by relating data points to lattice points and then using the cluster assignment of the lattice point; and
displaying one or more identified clusters to a user using a graphical user interface display at said system.
4 Assignments
0 Petitions
Accused Products
Abstract
A method and/or system for analyzing data using population clustering through density based merging.
-
Citations
21 Claims
-
1. A method of identifying clusters of data items, wherein a data item is associated with one or more values, said method using an information processing system with a logic processor comprising:
-
defining using a logic processor a plurality of lattice points in a data space able to contain some of said data items of interest, said lattice points being data reduction points, said lattice points spaced according to a lattice rule, wherein said lattice points provide a rules-based ordering of values and storing said plurality of lattice points in a memory of said information processing system; calculating weights for said lattice points using one or more of said data items according to a weighting rule by retrieving said plurality of said lattice points from said memory and performing said calculating using said processor; determining a density at said lattice points relating a weight at a lattice point to weights at nearby lattice points using a density function on said processor; creating, for each of a plurality of said lattice points, a directional association with at least one other of said lattice points using an association rule using said processor and storing said direction association in said memory; sequentially following directional associations between lattice points starting at a starting lattice point of said lattice points, thereby defining a lattice point path until a terminal lattice point for said lattice point path is reached wherein a terminal lattice point is identified when there is no surrounding point from which a lattice point can be connected using said processor; comparing a density of said terminal lattice point of said lattice point path to a density threshold thereby determining if said terminal lattice point indicates a new cluster, an existing cluster, or a default background cluster using said processor; and labeling lattice points in said lattice point path as background points, points belonging to said existing cluster;
or points belonging to said new cluster based on said determining using said processor and storing said labeling in said memory;repeating said sequentially following, said comparing, and said labeling until a majority of lattice points are labeled using said processor; distance merging lattice point paths by comparing pairs of terminal lattice points and if a distance between two terminal lattice points is below a distance threshold, merging points in each lattice point path connected to said terminal lattice points into a single cluster, wherein said distance threshold is expressed as a distance between lattice points; density merging lattice point paths by comparing pairs of terminal lattice points and if there is a path of consecutive lattice points along which values of said lattice point'"'"'s densities do not fall more than a density threshold below the minimum values of the two roots; repeating said density merging on initial lattice point paths and merged lattice point paths until no pairs of lattice points meet a density merging criteria; assigning lattice points to a cluster or to background noise; assigning data points to clusters by relating data points to lattice points and then using the cluster assignment of the lattice point; and displaying one or more identified clusters to a user using a graphical user interface display at said system. - View Dependent Claims (3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
2. (canceled)
-
4. (canceled)
-
16. An apparatus for creating groupings from data comprising:
-
logic processing means for assigning each piece of data from said set of data to a point on a lattice; logic processing means for assigning weights to each lattice point based on the data near said lattice point; logic processing means for determining for each of said lattice points if each of said lattice points should be associated with one of its surrounding lattice points and if so creating a pointer from the individual lattice point to the surrounding lattice point it is associated with; and logic processing means for creating clusterings of the lattice points.
-
-
17-20. -20. (canceled)
-
21. A method of identifying clusters of data items using an information processing system with a logic processor comprising:
-
defining using a logic processor a plurality of data reduction lattice points in a data space able to contain some of said data items, said lattice points spaced according to a lattice rule providing a rules-based ordering of values; storing said plurality of lattice points in a memory of said information processing system; calculating weights for said lattice points using one or more of said data items according to a weighting rule by retrieving said plurality of said lattice points from said memory and performing said calculating using said processor; wherein said weighting rule determines a weight at a lattice point by a weighted summing of nearby data values up to all data values; determining a density at said lattice points relating a weight at a lattice point to weights at nearby lattice points using a density function on said processor; creating, for each of a plurality of said lattice points, a directional association with at least one other of said lattice points using an association rule using said processor and storing said direction association in said memory; sequentially following directional associations between lattice points starting at a starting lattice point of said lattice points, thereby defining a lattice point path until a terminal lattice point for said lattice point path is reached wherein a terminal lattice point is identified when there is no surrounding point from which a lattice point can be connected using said processor; comparing a density of said terminal lattice point of said lattice point path to a density threshold thereby determining if said terminal lattice point indicates a new cluster, an existing cluster, or a default background cluster using said processor; and labeling lattice points in said lattice point path as background points, points belonging to said existing cluster;
or points belonging to said new cluster based on said determining using said processor and storing said labeling in said memory;repeating said sequentially following, said comparing, and said labeling until a majority of lattice points are labeled using said processor; distance merging lattice point paths by comparing pairs of terminal lattice points and if a distance between two terminal lattice points is below a distance threshold, merging points in each lattice point path connected to said terminal lattice points into a single cluster, wherein said distance threshold is expressed as a distance between lattice points; density merging lattice point paths by comparing pairs of terminal lattice points and if there is a path of consecutive lattice points along which values of said lattice point'"'"'s densities do not fall more than a density threshold below the minimum values of the two roots; repeating said density merging on initial lattice point paths and merged lattice point paths until no pairs of lattice points meet a density merging criteria; assigning lattice points to a cluster or to background noise; assigning data points to clusters by relating data points to lattice points and then using the cluster assignment of the lattice point; and displaying one or more identified clusters to a user using a graphical user interface display at said system.
-
Specification