System for identifying clusters in scatter plots using smoothed polygons with optimal boundaries
First Claim
1. A method for identifying clusters in two-dimensional data, wherein said data comprises a plurality of clusters, comprising:
- generating a two-dimensional histogram characterized by a grid having an x-axis and a y-axis and a selected number of bins in the x-direction and a selected number of bins in the y-direction, said data comprising n pairs of points (xi, yi), i=1, . . . ,n, said histogram comprising fewer bins than said points;
determining a density estimate based on said bins, wherein said density estimate is characterized by a three-dimensional plot depicting peaks and valleys; and
identifying at least one cluster in said data, said at least one cluster comprising a plurality of points which satisfy a selected density criteria.
1 Assignment
0 Petitions
Accused Products
Abstract
An apparatus and method for identifying clusters in two-dimensional data by generating a two-dimensional histogram characterized by a grid of bins, determining a density estimate based on the bins, and identifying at least one cluster in the data. A smoothed density estimate is generated using a Gaussian kernel estimator algorithm. Clusters are identified by locating peaks and valleys in the density estimate (e.g., by comparing slope of adjacent bins). Boundaries (e.g., polygons) around clusters are identified using bins after bins are identified as being associated with a cluster. Boundaries can be simplified (e.g., by reducing the number of vertices in a polygon) to facilitate data manipulation.
-
Citations
33 Claims
-
1. A method for identifying clusters in two-dimensional data, wherein said data comprises a plurality of clusters, comprising:
-
generating a two-dimensional histogram characterized by a grid having an x-axis and a y-axis and a selected number of bins in the x-direction and a selected number of bins in the y-direction, said data comprising n pairs of points (xi, yi), i=1, . . . ,n, said histogram comprising fewer bins than said points;
determining a density estimate based on said bins, wherein said density estimate is characterized by a three-dimensional plot depicting peaks and valleys; and
identifying at least one cluster in said data, said at least one cluster comprising a plurality of points which satisfy a selected density criteria. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method for identifying clusters in two-dimensional data, wherein said data comprises a plurality of clusters, comprising a plurality of points, the method comprising:
-
generating a density estimate based on said data, wherein said density estimate is characterized by a three-dimensional plot depicting peaks and valleys;
identifying at least one cluster in said data, said at least one cluster comprising a plurality of points which satisfy a selected density criteria; and
determining a boundary around said at least one cluster. - View Dependent Claims (13, 14, 15, 16)
-
-
17. An apparatus for identifying clusters in two-dimensional data, wherein said data comprises a plurality of clusters, comprising:
-
a processing device; and
a memory device coupled to said processing device for storing a cluster finder algorithm, said processing device being programmable in accordance with said cluster finder algorithm to generate a two-dimensional histogram characterized by a grid having an x-axis and a y-axis and a selected number of bins in the x-direction and a selected number of bins in the y-direction, said data comprising n pairs of points (xi, yi), i=1, . . . ,n, said histogram comprising fewer bins than said points, to determine a density estimate based on said bins, wherein said density estimate is characterized by a three-dimensional plot depicting peaks and valleys, and to identify at least one cluster in said data, said at least one cluster comprising a plurality of points which satisfy a selected density criteria. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
-
-
25. An apparatus for identifying clusters in two-dimensional data, wherein said data comprises a plurality of clusters, comprising:
-
a processing device; and
a memory device coupled to said processing device for storing a cluster finder algorithm, said processing device being programmable in accordance with said cluster finder algorithm to generate a density estimate based on said data, wherein said density estimate is characterized by a three-dimensional clot depicting peaks and valleys, identify at least one cluster in said data, said at least one cluster comprising a plurality of points which satisfy a selected density criteria, and determine a boundary around said at least one cluster. - View Dependent Claims (26, 27, 28)
-
-
29. A computer program product for identifying clusters in two-dimensional data comprising a plurality of points, wherein said data comprises a plurality of clusters, the computer program product comprising:
-
a computer-readable medium; and
a cluster finder module stored on said computer-readable medium that generates a density estimate based on said date, wherein said density estimate is characterized by a three-dimensional plot depicting peaks and valleys, identifies at least one cluster in said data, said at least one cluster comprising a plurality of points which satisfy a selected density criteria, and determines a boundary around said at least one cluster. - View Dependent Claims (30, 31, 32, 33)
-
Specification