Method and apparatus for morphological clustering having multiple dilation and erosion of switchable grid data cells
First Claim
1. A method for automatically identifying clusters in samples of multivariate statistical data, comprising the steps of:
- storing the data in the form of a grid of data cells, wherein each of the data cells is switchable to a state representing a data point, and wherein the grid has as many dimensions as the multivariate statistical data; and
performing multiple morphological operations on the grid of data cells to identify one or more clusters of data, wherein the step of performing multiple morphological operations includes performing a combination of multiple erosion and dilation steps to determine boundaries of well-defined clusters of data points and to eliminate smaller isolated clusters, wherein data points falling within the boundaries of a cluster are said to belong to that cluster.
4 Assignments
0 Petitions
Accused Products
Abstract
A technique for automatically identifying clusters of data from a set of data samples, by employing multiple morphological operations of a grid of data cells representative of the data samples. The data samples are stored (12) in the data cell grid (56) as binary quantities of which the position in the grid represents each multivariate data point. Potential cluster regions are identified by first performing a series of dilation steps (22) on the data grid, to expand contiguous regions covered by the data points until smaller regions merge into larger ones that are identified as the potential cluster regions. Then a series of erosion operations (26) is performed, shrinking the contiguous regions until smaller isolated regions are completely eliminated and thin linkages between other potential cluster regions are removed. Next, a further series of dilation operations (30) expands the contiguous regions again, to reconnect any potential cluster regions that were fragmented in the erosion operations. Each remaining contiguous region of data cells is defined as a cluster. Data points falling within a cluster are said to belong to that cluster (36) and may share at least some attributes because their defining parameters are so similar.
61 Citations
16 Claims
-
1. A method for automatically identifying clusters in samples of multivariate statistical data, comprising the steps of:
-
storing the data in the form of a grid of data cells, wherein each of the data cells is switchable to a state representing a data point, and wherein the grid has as many dimensions as the multivariate statistical data; and
performing multiple morphological operations on the grid of data cells to identify one or more clusters of data, wherein the step of performing multiple morphological operations includes performing a combination of multiple erosion and dilation steps to determine boundaries of well-defined clusters of data points and to eliminate smaller isolated clusters, wherein data points falling within the boundaries of a cluster are said to belong to that cluster.
-
-
2. A method for automatically identifying clusters in samples of multivariate statistical data, comprising the steps of:
-
storing the data in the form of a grid of data cells, wherein each of the data cells is switchable to a state representing a data point, and wherein the grid has as many dimensions as the multivariate statistical data; and
performing multiple morphological operations on the grid of data cells to identify one or more clusters of data, wherein the step of performing multiple morphological operations includes;
performing multiple dilation steps to expand contiguous regions of the grid covered by data points until adjacent sub-regions merge into larger regions identified as potential clusters;
then performing multiple erosion steps to shrink the potential clusters until isolated smaller ones are eliminated and thin linkages between other potential clusters are removed; and
then performing further multiple dilation steps to reconnect any potential clusters that were fragmented by the previous step, to leave clusters of data cells, wherein data points falling within the boundaries of a cluster are said to belong to that cluster. - View Dependent Claims (3, 4, 5, 6, 7)
each dilation step includes turning each data cell to an “
on”
condition if any immediately adjacent neighbor cell is already in the “
on”
condition; and
each erosion step includes turning each data cell to an “
off”
condition if any immediately adjacent neighbor cell is already in the “
off”
condition.
-
-
4. A method as defined in claim 3, wherein:
-
the step of performing multiple dilation steps includes performing M such steps;
the step of performing multiple erosion steps includes performing M+1 such steps; and
the step of performing further multiple dilation steps includes performing N such steps.
-
-
5. A method as defined in claim 4, wherein M is 3 and N is 4.
-
6. A method as defined in claim 2, and further comprising the steps of:
-
identifying the data cells in each cluster; and
identifying data points that fall within the boundaries of each cluster.
-
-
7. A method as defined in claim 6, and further comprising the step of:
assigning data points pertaining to newly acquired data to clusters.
-
8. A method for automatically categorizing samples of multivariate statistical data based on whether the samples fall into clusters, the method comprising the steps of:
-
storing data samples in the form of a multidimensional grid of data cells, wherein each of the data cells is switchable to a state representing a data point by its position in the grid;
performing multiple morphological dilation steps to expand contiguous regions of the grid covered by data points, until adjacent regions merge into larger regions identified as potential cluster regions;
then performing multiple morphological erosion steps to shrink the potential cluster regions until smaller ones are eliminated and thin linkages between other potential cluster regions are removed; and
then performing further multiple morphological dilation steps to reconnect any potential cluster regions that were fragmented by the previous step, leaving clusters of data cells, wherein data points falling within the boundaries of a cluster are said to belong to that cluster and to share at least some common attributes. - View Dependent Claims (9, 10, 11, 12)
labeling the data cells in each cluster as belonging to that cluster; and
labeling data points that fall within the boundaries of a cluster as belonging to that cluster.
-
-
10. A method as defined in claim 8, wherein:
-
each data point is represented by a data cell in an “
on”
condition;
each dilation step includes turning each data cell to the “
on”
condition if any immediately adjacent neighbor data cell is already in the “
on”
condition; and
each erosion step includes turning each data cell to an “
off”
condition if any immediately adjacent neighbor data cell is already in the “
off”
condition.
-
-
11. A method as defined in claim 10, wherein:
-
the step of performing multiple dilation steps includes performing M such steps;
the step of performing multiple erosion steps includes performing M+1 such steps; and
the step of performing further multiple dilation steps includes performing N such steps.
-
-
12. A method as defined in claim 11, wherein M is 3 and N is 4.
-
13. A system for automatically identifying clusters in samples of multivariate statistical data, comprising:
-
a memory for storing multivariate statistical data in the form of a logical multidimensional grid of data cells, wherein each of the data cells is switchable to a state representing a sample data point, and wherein the position of the data cell in the multidimensional grid represents multiple dimensions of the data point; and
a processor programmed to perform multiple morphological operations on the grid of data cells and to thereby identify one or more clusters of data, wherein the processor programmed to perform multiple morphological operations includes a morphological processing module for performing a combination of multiple erosion and dilation steps to determine boundaries of well-defined clusters of data points and to eliminate smaller isolated clusters, wherein data points falling within the boundaries of a cluster are said to belong to that cluster.
-
-
14. A system for automatically identifying clusters in samples of multivariate statistical data, comprising:
-
a memory for storing multivariate statistical data in the form of a logical multidimensional grid of data cells, wherein each of the data cells is switchable to a state representing a sample data point, and wherein the position of the data cell in the multidimensional and represents multiple dimensions of the data point; and
a processor programmed to perform multiple morphological operations on the grid of data cells and to thereby identify one or more clusters of data, wherein the processor programmed to perform multiple morphological operations includes;
a first morphological processing module, for performing multiple dilation steps to expand contiguous regions of the grid encompassed by data points until adjacent sub-regions merge into larger regions identified as potential cluster regions;
a second morphological processing module, for performing multiple erosion steps to shrink the potential cluster regions until isolated smaller regions are eliminated and thin linkages between other potential cluster regions are removed; and
a third morphological processing module, for performing further multiple dilation steps to reconnect any potential cluster regions that were fragmented by the second morphological processing module, to leave clusters of data cells, wherein data points falling within the boundaries of a cluster are said to belong to that cluster. - View Dependent Claims (15, 16)
a processing module for locating each data cell that falls within the boundary of each cluster, and labeling appropriate data cells as falling within the clusters; and
another processing module, for identifying each data point within the boundaries of a cluster as belonging to that cluster.
-
-
16. A system as defined in claim 15, and further comprising:
yet another processing module, for determining whether data cells not belonging to a cluster should be eliminated from consideration or assigned to a nearby cluster.
Specification