Adaptive fast fuzzy clustering system

US 5,263,120 A
Filed: 04/29/1991
Issued: 11/16/1993
Est. Priority Date: 04/29/1991
Status: Expired due to Term

First Claim

Patent Images

1. A parallel processing computer system for clustering N data points in real numerical M-dimensional feature space by adaptively separating classes of patterns, said computer system comprising:

a plurality of processors;

means for decomposing said M-dimensional feature space into M 1-dimensional feature spaces, with each said 1-dimensional feature space having a range of feature values;

means for numerically ordering each said feature value in each of said 1-dimensional feature spaces in ascending sort sequence;

means for calculating the gap lengths for said ordered feature values;

means for partially-ordering said gap lengths within each of said 1-dimensional feature spaces;

means for selecting a plurality of tentative split-gaps from said partially-ordered gap lengths, and means for further selecting a split-gap from said plurality of tentative split-gaps;

means for splitting a portion of said N data points corresponding to said split-gap on its associated feature; and

means for iteratively repeating said calculating, partially-ordering, selecting and splitting until said classes of patterns are separated.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A parallel processing computer system for clustering data points in continuous feature space by adaptively separating classes of patterns. The preferred embodiment for this massively parallel system includes preferably one computer processor per feature and requires a single a priori assumption of central tendency in the distributions defining the pattern classes. It advantageously exploits the presence of noise inherent in the data gathering to not only classify data points into clusters, but also measure the certainty of the classification for each data point, thereby identifying outliers and spurious data points. The system taught by the present invention is based upon the gaps between successive data values within single features. This single feature discrimination aspect is achieved by applying a minimax comparison involving gap lengths and locations of the largest and smallest gaps. Clustering may be performed in near-real-time on huge data spaces having unlimited numbers of features.

121 Citations

14 Claims

1. A parallel processing computer system for clustering N data points in real numerical M-dimensional feature space by adaptively separating classes of patterns, said computer system comprising:
- a plurality of processors;
  
  means for decomposing said M-dimensional feature space into M 1-dimensional feature spaces, with each said 1-dimensional feature space having a range of feature values;
  
  means for numerically ordering each said feature value in each of said 1-dimensional feature spaces in ascending sort sequence;
  
  means for calculating the gap lengths for said ordered feature values;
  
  means for partially-ordering said gap lengths within each of said 1-dimensional feature spaces;
  
  means for selecting a plurality of tentative split-gaps from said partially-ordered gap lengths, and means for further selecting a split-gap from said plurality of tentative split-gaps;
  
  means for splitting a portion of said N data points corresponding to said split-gap on its associated feature; and
  
  means for iteratively repeating said calculating, partially-ordering, selecting and splitting until said classes of patterns are separated.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The system recited in claim 1 wherein said means for decomposing comprises:
    - means for linearly scaling said range of feature values in each said one-dimensional feature space between the range of integers expressible on said computer system, and then assigning one of said integers to each said feature value in each said feature space.
  - 3. The system recited in claim 2 wherein said means for calculating comprises:
    - means for subtracting each said ordered feature value from its successive feature value, to obtain a sequence of N-1 said gap lengths for each said M 1-dimensional feature space.
  - 4. The system recited in claim 3 wherein said means for partially-ordering comprises:
    - means for segregating M first portions from M sequences of gap lengths, each of said M first portions consisting of all of the smallest of said gap lengths from one of said M sequences of gap lengths, and means for further segregating M second portions from said M sequences of gap lengths, each of said M second portions consisting of all of the largest of said gap lengths from said one of said M sequences of gap lengths.
  - 5. The system recited in claim 4 wherein said means for selecting comprises:
    - means for searching said first portion of an smallest of said gap lengths as a Gestalt for the extreme left mode and the extreme right mode thereof; and
      
      means for searching said second portion of the largest of said gap lengths sequentially from the largest to the smallest thereof, until a gap length corresponding to a tentative said split-gap is obtained which is disposed medially of said extreme left mode and said extreme right mode.
  - 6. The system recited in claim 1 wherein said plurality of processors has a minimum of one processor assigned to each said 1-dimensional feature space.
  - 7. The system recited in claim 1 wherein said plurality of processors includes an integrating processor.

8. A parallel processing computer system for clustering N data points in real numerical M-dimensional feature space corresponding to a plurality of M features by adaptively separating classes of patterns, with M and N being positive integer values, said computer system comprising:
- a plurality of processors;
  
  means for decomposing said M-dimensional feature space into a plurality of M one-dimensional feature spaces, with each said one-dimensional feature space having a range of feature values for each of said plurality of M features, each of said one-dimensional feature spaces comprising all of the values of a single feature of said N data points;
  
  said means for decomposing comprising means for linearly scaling said range of feature values in each said one-dimensional feature space between the range of integers expressible on said parallel processing computer system, and then assigning one of said integers to each said feature value in each said feature space;
  
  means for numerically ordering each said feature value in each of said one-dimensional feature spaces in ascending sort sequence;
  
  means for calculating gap lengths for said ordered feature values comprising means for subtracting each said ordered feature value from its successive feature value, to obtain a sequence of N-1 said gap lengths for each said M one-dimensional feature space;
  
  means for partially-ordering said gap lengths within each of said one-dimensional feature spaces comprising means for segregating M first portions from M sequences of gap lengths, each of said M first portions consisting of all of the smallest of said gap lengths from one of said M sequences of gap lengths, and means for further segregating M second portions from said M sequences of gap lengths, each of said M second portions consisting of all of the largest of said gap lengths from said one of said M sequences of gap lengths;
  
  means for selecting a plurality of tentative split-gaps from said partially-ordered gap lengths, with each said selected tentative split-gap selected for each said one-dimensional feature space, comprisingmeans for searching for an extreme left mode and the extreme right mode among all of said gaps lengths within each of said M first portions of the smallest of said gap lengths, andmeans for searching said M second portions of the largest of said gap lengths sequentially from the largest to the smallest thereof, until a gap length representing a said tentative split-gap is obtained which is disposed medially of said extreme left mode and said extreme right mode;
  
  means for choosing a split-gap from said plurality of tentative split-gaps comprising means for picking a split-gap corresponding to the largest of an aggregation of said plurality of tentative split-gaps obtained by said iterative repetitions of said means for partially-ordering and of said means for selecting;
  
  means for splitting a portion of said N data points corresponding to said split-gap on its associated feature; and
  
  means for iteratively repeating said calculating, partially-ordering, selecting, choosing and splitting until said classes of patterns are separated.

9. A parallel processing computer system for clustering N data points in real numerical M-dimensional feature space by adaptively separating classes of patterns, said computer system comprising:
- a plurality of processors;
  
  means for decomposing said M-dimensional feature space into M 1-dimensional feature spaces, with each said 1-dimensional feature space having a range of feature values;
  
  said plurality of processors having a minimum of one processor assigned to each said 1-dimensional feature space;
  
  means for numerically ordering each said feature value in each of said 1-dimensional feature spaces in ascending sort sequence;
  
  means for calculating the gap lengths for said ordered feature values;
  
  means for partially-ordering said gap lengths within each of said 1-dimensional feature spaces;
  
  means for selecting a plurality of tentative split-gaps from said partially-ordered gap lengths, and means for further selecting a split-gap from said plurality of tentative split-gaps;
  
  means for splitting a portion of said N data points corresponding to said split-gap on its associated feature; and
  
  means for iteratively repeating said calculating, partially-ordering, selecting and splitting until said classes of patterns are separated.
- View Dependent Claims (10, 11, 12, 13, 14)
- - 10. The system recited in claim 9 wherein said means for decomposing comprises:
    - means for linearly scaling said range of feature values in each said one-dimensional feature space between the range of integers expressible on said computer system, and then assigning one of said integers to each said feature value in each said feature space.
  - 11. The system recited in claim 10 wherein said means for calculating comprises:
    - means for subtracting each said ordered feature value from its successive feature value, to obtain a sequence of N-1 said gap lengths for each said M 1-dimensional feature space.
  - 12. The system recited in claim 11 wherein said means for partially-ordering comprises:
    - means for segregating M first portions from M sequences of gap lengths, each of said M first portions consisting of all of the smallest of said gap lengths from one of said M sequences of gap lengths, and means for further segregating M second portions from said M sequences of gap lengths, each of said M second portions consisting of all of the largest of said gap lengths from said one of said M sequences of gap lengths.
  - 13. The system recited in claim 12 wherein said means for selecting comprises:
    - means for searching said first portion of the smallest of said gap lengths as a Gestalt for the extreme left mode and an extreme right mode thereof; and
      
      means for searching said second portion of the largest of said gap lengths sequentially from the largest to the smallest thereof, until a gap length corresponding to a tentative said split-gap is obtained which is disposed medially of said extreme left mode and said extreme right mode.
  - 14. The system recited in claim 9 wherein said plurality of processors includes an integrating processor.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Michael A. Bickel
Original Assignee
Michael A. Bickel
Inventors
Bickel, Michael A.
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
DAVIS, GEORGE B

Application Number

US07/692,735
Time in Patent Office

932 Days
Field of Search

395/10, 395/11, 395/2, 395/900
US Class Current

706/62
CPC Class Codes

G06F 18/23   Clustering techniques

G06F 18/2433   Single-class perspective, e...

Y10S 706/90   Fuzzy logic

Adaptive fast fuzzy clustering system

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

121 Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Adaptive fast fuzzy clustering system

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

121 Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links