Adaptive fast fuzzy clustering system
First Claim
1. A parallel processing computer system for clustering N data points in real numerical M-dimensional feature space by adaptively separating classes of patterns, said computer system comprising:
- a plurality of processors;
means for decomposing said M-dimensional feature space into M 1-dimensional feature spaces, with each said 1-dimensional feature space having a range of feature values;
means for numerically ordering each said feature value in each of said 1-dimensional feature spaces in ascending sort sequence;
means for calculating the gap lengths for said ordered feature values;
means for partially-ordering said gap lengths within each of said 1-dimensional feature spaces;
means for selecting a plurality of tentative split-gaps from said partially-ordered gap lengths, and means for further selecting a split-gap from said plurality of tentative split-gaps;
means for splitting a portion of said N data points corresponding to said split-gap on its associated feature; and
means for iteratively repeating said calculating, partially-ordering, selecting and splitting until said classes of patterns are separated.
0 Assignments
0 Petitions
Accused Products
Abstract
A parallel processing computer system for clustering data points in continuous feature space by adaptively separating classes of patterns. The preferred embodiment for this massively parallel system includes preferably one computer processor per feature and requires a single a priori assumption of central tendency in the distributions defining the pattern classes. It advantageously exploits the presence of noise inherent in the data gathering to not only classify data points into clusters, but also measure the certainty of the classification for each data point, thereby identifying outliers and spurious data points. The system taught by the present invention is based upon the gaps between successive data values within single features. This single feature discrimination aspect is achieved by applying a minimax comparison involving gap lengths and locations of the largest and smallest gaps. Clustering may be performed in near-real-time on huge data spaces having unlimited numbers of features.
121 Citations
14 Claims
-
1. A parallel processing computer system for clustering N data points in real numerical M-dimensional feature space by adaptively separating classes of patterns, said computer system comprising:
-
a plurality of processors; means for decomposing said M-dimensional feature space into M 1-dimensional feature spaces, with each said 1-dimensional feature space having a range of feature values; means for numerically ordering each said feature value in each of said 1-dimensional feature spaces in ascending sort sequence; means for calculating the gap lengths for said ordered feature values; means for partially-ordering said gap lengths within each of said 1-dimensional feature spaces; means for selecting a plurality of tentative split-gaps from said partially-ordered gap lengths, and means for further selecting a split-gap from said plurality of tentative split-gaps; means for splitting a portion of said N data points corresponding to said split-gap on its associated feature; and means for iteratively repeating said calculating, partially-ordering, selecting and splitting until said classes of patterns are separated. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A parallel processing computer system for clustering N data points in real numerical M-dimensional feature space corresponding to a plurality of M features by adaptively separating classes of patterns, with M and N being positive integer values, said computer system comprising:
-
a plurality of processors; means for decomposing said M-dimensional feature space into a plurality of M one-dimensional feature spaces, with each said one-dimensional feature space having a range of feature values for each of said plurality of M features, each of said one-dimensional feature spaces comprising all of the values of a single feature of said N data points; said means for decomposing comprising means for linearly scaling said range of feature values in each said one-dimensional feature space between the range of integers expressible on said parallel processing computer system, and then assigning one of said integers to each said feature value in each said feature space; means for numerically ordering each said feature value in each of said one-dimensional feature spaces in ascending sort sequence; means for calculating gap lengths for said ordered feature values comprising means for subtracting each said ordered feature value from its successive feature value, to obtain a sequence of N-1 said gap lengths for each said M one-dimensional feature space; means for partially-ordering said gap lengths within each of said one-dimensional feature spaces comprising means for segregating M first portions from M sequences of gap lengths, each of said M first portions consisting of all of the smallest of said gap lengths from one of said M sequences of gap lengths, and means for further segregating M second portions from said M sequences of gap lengths, each of said M second portions consisting of all of the largest of said gap lengths from said one of said M sequences of gap lengths; means for selecting a plurality of tentative split-gaps from said partially-ordered gap lengths, with each said selected tentative split-gap selected for each said one-dimensional feature space, comprising means for searching for an extreme left mode and the extreme right mode among all of said gaps lengths within each of said M first portions of the smallest of said gap lengths, and means for searching said M second portions of the largest of said gap lengths sequentially from the largest to the smallest thereof, until a gap length representing a said tentative split-gap is obtained which is disposed medially of said extreme left mode and said extreme right mode; means for choosing a split-gap from said plurality of tentative split-gaps comprising means for picking a split-gap corresponding to the largest of an aggregation of said plurality of tentative split-gaps obtained by said iterative repetitions of said means for partially-ordering and of said means for selecting; means for splitting a portion of said N data points corresponding to said split-gap on its associated feature; and means for iteratively repeating said calculating, partially-ordering, selecting, choosing and splitting until said classes of patterns are separated.
-
-
9. A parallel processing computer system for clustering N data points in real numerical M-dimensional feature space by adaptively separating classes of patterns, said computer system comprising:
-
a plurality of processors; means for decomposing said M-dimensional feature space into M 1-dimensional feature spaces, with each said 1-dimensional feature space having a range of feature values; said plurality of processors having a minimum of one processor assigned to each said 1-dimensional feature space; means for numerically ordering each said feature value in each of said 1-dimensional feature spaces in ascending sort sequence; means for calculating the gap lengths for said ordered feature values; means for partially-ordering said gap lengths within each of said 1-dimensional feature spaces; means for selecting a plurality of tentative split-gaps from said partially-ordered gap lengths, and means for further selecting a split-gap from said plurality of tentative split-gaps; means for splitting a portion of said N data points corresponding to said split-gap on its associated feature; and means for iteratively repeating said calculating, partially-ordering, selecting and splitting until said classes of patterns are separated. - View Dependent Claims (10, 11, 12, 13, 14)
-
Specification