System for identifying clusters in scatter plots using smoothed polygons with optimal boundaries

US 6,944,338 B2
Filed: 05/11/2001
Issued: 09/13/2005
Est. Priority Date: 05/11/2000
Status: Expired due to Term

First Claim

Patent Images

1. A method for identifying clusters in two-dimensional data, wherein said data comprises a plurality of clusters, comprising:

generating a two-dimensional histogram characterized by a grid having an x-axis and a y-axis and a selected number of bins in the x-direction and a selected number of bins in the y-direction, said data comprising n pairs of points (x_i, y_i), i=1, . . . ,n, said histogram comprising fewer bins than said points;

determining a density estimate based on said bins, wherein said density estimate is characterized by a three-dimensional plot depicting peaks and valleys; and

identifying at least one cluster in said data, said at least one cluster comprising a plurality of points which satisfy a selected density criteria.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An apparatus and method for identifying clusters in two-dimensional data by generating a two-dimensional histogram characterized by a grid of bins, determining a density estimate based on the bins, and identifying at least one cluster in the data. A smoothed density estimate is generated using a Gaussian kernel estimator algorithm. Clusters are identified by locating peaks and valleys in the density estimate (e.g., by comparing slope of adjacent bins). Boundaries (e.g., polygons) around clusters are identified using bins after bins are identified as being associated with a cluster. Boundaries can be simplified (e.g., by reducing the number of vertices in a polygon) to facilitate data manipulation.

Citations

33 Claims

1. A method for identifying clusters in two-dimensional data, wherein said data comprises a plurality of clusters, comprising:
- generating a two-dimensional histogram characterized by a grid having an x-axis and a y-axis and a selected number of bins in the x-direction and a selected number of bins in the y-direction, said data comprising n pairs of points (x_i, y_i), i=1, . . . ,n, said histogram comprising fewer bins than said points;
  
  determining a density estimate based on said bins, wherein said density estimate is characterized by a three-dimensional plot depicting peaks and valleys; and
  
  identifying at least one cluster in said data, said at least one cluster comprising a plurality of points which satisfy a selected density criteria.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. A method as claimed in claim 1, wherein said determining step comprises generating a smoothed density estimate.
  - 3. A method as claimed in claim 2, wherein said smoothed density estimate is generated using a Gaussian kernel estimator algorithm.
  - 4. A method as claimed in claim 1, further comprising determining a boundary around said at least one cluster.
  - 5. A method as claimed in claim 4, wherein said boundary is a polygon characterized by a plurality of vertices, and further comprising processing said boundary to reduce the number of said vertices while enclosing approximately the same area within said boundary.
  - 6. A method as claimed in claim 1, wherein, said identifying step comprises locating valleys in said density estimate and identifying each of said plurality of clusters as being separated from the others by at least one of said valleys.
  - 7. A method as claimed in claim 6, wherein said identifying step further comprises comparing the slope of each of said bins with that of adjacent ones of said bins.
  - 8. A method as claimed in claim 7, wherein said identifying step further comprises:
    - determining which of said bins correspond to respective peaks of said plurality of clusters using said slope;
      
      assigning said bins that correspond to said peaks with respective cluster identification codes; and
      
      assigning each of said bins associated with one of said peaks with the corresponding one of said cluster identification codes.
  - 9. A method as claimed in claim 8, further comprising determining a boundary around said at least one cluster.
  - 10. A method as claimed in claim 9, wherein said step of determining a boundary comprises analyzing each of said bins to determine if adjacent ones of said bins have the same one of said cluster identification codes, said bins being labeled as exterior points if they have no adjacent said bins with said data or the same one of said cluster identification codes.
  - 11. A method as claimed in claim 10, wherein said boundary is a polygon characterized by a plurality of vertices, and further comprising processing said boundary to reduce the number of said vertices while enclosing approximately the same area within said boundary.

12. A method for identifying clusters in two-dimensional data, wherein said data comprises a plurality of clusters, comprising a plurality of points, the method comprising:
- generating a density estimate based on said data, wherein said density estimate is characterized by a three-dimensional plot depicting peaks and valleys;
  
  identifying at least one cluster in said data, said at least one cluster comprising a plurality of points which satisfy a selected density criteria; and
  
  determining a boundary around said at least one cluster.
- View Dependent Claims (13, 14, 15, 16)
- - 13. A method as claimed in claim 12, wherein said generating step comprises generating a smoothed density estimate.
  - 14. A method as claimed in claim 13, wherein said smoothed density estimate is generated using a Gaussian kernel estimator algorithm.
  - 15. A method as claimed in claim 12, wherein said boundary is a polygon characterized by a plurality of vertices, and further comprising processing said boundary to reduce the number of said vertices while enclosing approximately the same area within said boundary.
  - 16. A method as claimed in claim 12, wherein said data comprises n pairs of points (x_i, y_i), i=1, . . . ,n, and said generating step comprises:
    - generating a two-dimensional histogram, said histogram comprising fewer bins than said points; and
      
      determining said density estimate based on said bins.

17. An apparatus for identifying clusters in two-dimensional data, wherein said data comprises a plurality of clusters, comprising:
- a processing device; and
  
  a memory device coupled to said processing device for storing a cluster finder algorithm, said processing device being programmable in accordance with said cluster finder algorithm to generate a two-dimensional histogram characterized by a grid having an x-axis and a y-axis and a selected number of bins in the x-direction and a selected number of bins in the y-direction, said data comprising n pairs of points (x_i, y_i), i=1, . . . ,n, said histogram comprising fewer bins than said points, to determine a density estimate based on said bins, wherein said density estimate is characterized by a three-dimensional plot depicting peaks and valleys, and to identify at least one cluster in said data, said at least one cluster comprising a plurality of points which satisfy a selected density criteria.
- View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
- - 18. An apparatus as claimed in claim 17, wherein said processing device is programmable to generate a smoothed density estimate.
  - 19. An apparatus as claimed in claim 18, wherein said processing device is programmable to implement a Gaussian kernel estimator algorithm to generate said smoothed density estimate.
  - 20. An apparatus as claimed in claim 17, wherein said processing device is programmable to determine a boundary around said at least one cluster.
  - 21. An apparatus as claimed in claim 20, wherein said boundary is a polygon characterized by a plurality of vertices, and said processing device is programmable process said boundary to reduce the number of said vertices while enclosing approximately the same area within said boundary.
  - 22. An apparatus as claimed in claim 21, wherein said apparatus further comprises a user input device and a display connected to said processing device, said display providing a visual indication of said plurality of clusters, said processing device being operable to provide a user with said boundary of one of said plurality clusters when selected via said user input device.
  - 23. An apparatus as claimed in claim 22, said processing device being operable to alter said boundary of at least one of said plurality clusters in response to user commands generated via said user input device.
  - 24. An apparatus as claimed in claim 17, wherein said two-dimensional data represents a first data set and said processing device is operable to perform batch processing of a second data set, said processing device storing a template in said memory device corresponding to said at least one cluster in said first data set and using said template to facilitate location of clusters in said second data set.

25. An apparatus for identifying clusters in two-dimensional data, wherein said data comprises a plurality of clusters, comprising:
- a processing device; and
  
  a memory device coupled to said processing device for storing a cluster finder algorithm, said processing device being programmable in accordance with said cluster finder algorithm to generate a density estimate based on said data, wherein said density estimate is characterized by a three-dimensional clot depicting peaks and valleys, identify at least one cluster in said data, said at least one cluster comprising a plurality of points which satisfy a selected density criteria, and determine a boundary around said at least one cluster.
- View Dependent Claims (26, 27, 28)
- - 26. An apparatus as claimed in claim 25, wherein said processing device is programmable to generate a smoothed density estimate.
  - 27. An apparatus as claimed in claim 26, wherein said processing device is programmable to implement a Gaussian kernel estimator algorithm to generate said smoothed density estimate.
  - 28. A method as claimed in claim 25, wherein said data comprises n pairs of points (x_i, y_i), i=1, . . . ,n, and processing device is programmable to generate a two-dimensional histogram, said histogram comprising fewer bins than said points, and determine said density estimate based on said bins.

29. A computer program product for identifying clusters in two-dimensional data comprising a plurality of points, wherein said data comprises a plurality of clusters, the computer program product comprising:
- a computer-readable medium; and
  
  a cluster finder module stored on said computer-readable medium that generates a density estimate based on said date, wherein said density estimate is characterized by a three-dimensional plot depicting peaks and valleys, identifies at least one cluster in said data, said at least one cluster comprising a plurality of points which satisfy a selected density criteria, and determines a boundary around said at least one cluster.
- View Dependent Claims (30, 31, 32, 33)
- - 30. A computer program product as claimed in claim 29, wherein said cluster finder module generates a smoothed density estimate.
  - 31. A computer program product as claimed in claim 30, wherein said smoothed density estimate is generated using a Gaussian kernel estimator algorithm.
  - 32. A computer program product as claimed in claim 29, wherein said boundary is a polygon characterized by a plurality of vertices, said cluster finder module being operable to process said boundary to reduce the number of said vertices while enclosing approximately the same area within said boundary.
  - 33. A computer program product as claimed in claim 29, wherein said data comprises n pairs of points (x_i, y_i), i=1, . . . n, said cluster finder module being operable to generate a two-dimensional histogram, said histogram comprising fewer bins than said points, and to determine said density estimate based on said bins.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Becton, Dickinson & Co
Original Assignee
Becton, Dickinson & Co
Inventors
Gluhovsky, Ilya, Lock, Michael D., Dalal, Sunil S.
Primary Examiner(s)
Johns, Andrew W.
Assistant Examiner(s)
Alavi, Amir

Application Number

US09/853,037
Publication Number

US 20020029235A1
Time in Patent Office

1,586 Days
Field of Search

382/128, 382/133, 382/168, 382/170, 382/171, 382/173, 382/189, 382/199, 382/224, 382/225, 382/266, 436/63
US Class Current

382/168
CPC Class Codes

G01N 15/1456   without spatial resolution ...

G01N 2015/1006   for cytology

G01N 2015/1413   Hydrodynamic focussing

G01N 2015/1477   Multiparameters

G06F 18/2321   using statistics or functio...

G06V 20/698   Matching; Classification

System for identifying clusters in scatter plots using smoothed polygons with optimal boundaries

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

33 Claims

Specification

Solutions

Use Cases

Quick Links

System for identifying clusters in scatter plots using smoothed polygons with optimal boundaries

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

33 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links