System for identifying clusters in scatter plots using smoothed polygons with optimal boundaries
First Claim
1. A method for identifying clusters in two-dimensional data comprising:
- generating a two-dimensional histogram characterized by a grid having an x-axis and a y-axis and a selected number of bins in the x-direction and a selected number of bins in the y-direction, said data comprising n pairs of points (xi, yi), i=1, . . . ,n, said histogram comprising fewer bins than said points;
determining a density estimate based on said bins; and
identifying at least one cluster in said data, said at least one cluster comprising a plurality of points which satisfy a selected density criteria.
1 Assignment
0 Petitions
Accused Products
Abstract
An apparatus and method for identifying clusters in two-dimensional data by generating a two-dimensional histogram characterized by a grid of bins, determining a density estimate based on the bins, and identifying at least one cluster in the data. A smoothed density estimate is generated using a Gaussian kernel estimator algorithm. Clusters are identified by locating peaks and valleys in the density estimate (e.g., by comparing slope of adjacent bins). Boundaries (e.g., polygons) around clusters are identified using bins after bins are identified as being associated with a cluster. Boundaries can be simplified (e.g., by reducing the number of vertices in a polygon) to facilitate data manipulation.
-
Citations
33 Claims
-
1. A method for identifying clusters in two-dimensional data comprising:
-
generating a two-dimensional histogram characterized by a grid having an x-axis and a y-axis and a selected number of bins in the x-direction and a selected number of bins in the y-direction, said data comprising n pairs of points (xi, yi), i=1, . . . ,n, said histogram comprising fewer bins than said points;
determining a density estimate based on said bins; and
identifying at least one cluster in said data, said at least one cluster comprising a plurality of points which satisfy a selected density criteria. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 18, 19, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33)
-
-
12. A method for identifying clusters in two-dimensional data comprising a plurality of points, the method comprising:
-
generating a density estimate based on said data;
identifying at least one cluster in said data, said at least one cluster comprising a plurality of points which satisfy a selected density criteria; and
determining a boundary around said at least one cluster.
-
-
17. An apparatus for identifying clusters in two-dimensional data comprising:
-
a processing device; and
a memory device coupled to said processing device for storing a cluster finder algorithm, said processing device being programmable in accordance with said cluster finder algorithm to generate a two-dimensional histogram characterized by a grid having an x-axis and a y-axis and a selected number of bins in the x-direction and a selected number of bins in the y-direction, said data comprising n pairs of points (xi, yi),i=1, . . . ,n, said histogram comprising fewer bins than said points, to determine a density estimate based on said bins, and to identify at least one cluster in said data, said at least one cluster comprising a plurality of points which satisfy a selected density criteria.
-
-
25. An apparatus for identifying clusters in two-dimensional data comprising:
-
a processing device; and
a memory device coupled to said processing device for storing a cluster finder algorithm, said processing device being programmable in accordance with said cluster finder algorithm to generate a density estimate based on said data, identify at least one cluster in said data, said at least one cluster comprising a plurality of points which satisfy a selected density criteria, and determine a boundary around said at least one cluster.
-
-
29. A computer program product for identifying clusters in two-dimensional data comprising a plurality of points, the computer program product comprising:
-
a computer-readable medium; and
a cluster finder module stored on said computer-readable medium that generates a density estimate based on said data, identifies at least one cluster in said data, said at least one cluster comprising a plurality of points which satisfy a selected density criteria, and determines a boundary around said at least one cluster.
-
Specification