System and method for dynamic data clustering

US 20030093411A1
Filed: 11/09/2001
Published: 05/15/2003
Est. Priority Date: 11/09/2001
Status: Abandoned Application

First Claim

Patent Images

1. A method for dynamically identifying clusters of related data comprising:

launching a probe from a first position in an M-dimensional space, said M-dimensional space having a plurality of data points, each of said plurality of data points associated with a data record, each data record having at least M number of data fields;

determining a new position for said probe in said M-dimensional space based on a current position of said probe relative to at least a portion of said plurality of data points in said M-dimensional space;

moving said probe from said current position to said new position;

repeating said determining a new position for said probe until said new position and said current position are approximately a same position;

dynamically identifying a cluster upon determining said same position in said M-dimensional space.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for dynamically identifying clusters of related data in a database uses a probe to identify the clusters. These clusters, also known as density patterns, are identified by launching the probe from an initial position in a data space associated with the data comprised of a plurality of data points. Each of the data points attracts the probe to itself. Distant data points attract the probe to a lesser extent than do proximate data points. In this manner, the probe is drawn along a trajectory toward an equilibrium point. Once the equilibrium point is reached, a cluster is identified and its location optionally stored. Additional probes are launched from different initial positions in the data space to identify other clusters that may exist in the data space until no unique clusters are identified. The collection of identified clusters is representative of a number and, in some embodiments of the present invention, a general location of related data within the data space.

34 Citations

View as Search Results

25 Claims

1. A method for dynamically identifying clusters of related data comprising:
- launching a probe from a first position in an M-dimensional space, said M-dimensional space having a plurality of data points, each of said plurality of data points associated with a data record, each data record having at least M number of data fields;
  
  determining a new position for said probe in said M-dimensional space based on a current position of said probe relative to at least a portion of said plurality of data points in said M-dimensional space;
  
  moving said probe from said current position to said new position;
  
  repeating said determining a new position for said probe until said new position and said current position are approximately a same position;
  
  dynamically identifying a cluster upon determining said same position in said M-dimensional space.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 2. The method of claim 1, further comprising launching another probe from another position in said M-dimensional space to initiate identification of another cluster in said M-dimensional space.
  - 3. The method of claim 1, wherein said launching another probe from another position comprises randomly determining said another position in said M-dimensional space.
  - 4. The method of claim 1, wherein said launching another probe from another position comprises determining said another position in said M-dimensional space outside a predetermined proximity from a previous probe trajectory.
  - 5. The method of claim 1, wherein said launching another probe from another position comprises determining said another position in said M-dimensional space beyond a predetermined proximity from said identified cluster.
  - 6. The method of claim 1, wherein said launching another probe from another position comprises determining said another position as one of said plurality of data points.
  - 7. The method of claim 1, wherein said launching another probe from another position comprises determining said another position in said M-dimensional space as one of said plurality of data points that is outside of a predetermined proximity from a previous probe trajectory and beyond a predetermined proximity from said identified cluster.
  - 8. The method of claim 1, wherein said determining a new position for said probe comprises applying a localized force function to said probe.
  - 9. The method of claim 8, wherein said determining a new position for said probe comprises applying a localized coulomb force function to said probe.
  - 10. The method of claim 8, wherein said determining a new position for said probe comprises applying a localized force function to said probe, said force function based on a radial distance between said probe and each of said plurality of data points.
  - 11. The method of claim 8, wherein said determining a new position for said probe comprises applying a potential function to said probe.
  - 12. The method of claim 8, wherein said determining a new position for said probe comprises applying a potential function to said probe, said potential function based on at least one of a weight function and a quadratic function.
  - 13. The method of claim 12, wherein said applying a potential function to said probe comprises applying a product of a weight function and a quadratic function.
  - 14. The method of claim 13, wherein said applying a potential function to said probe comprises applying a potential function of the form V=R²*exp(−
    - R²/2²), where “
      
      V”
      
      is the potential between said probe and one of said plurality of data points, “
      
      R”
      
      is the distance in said M-dimensional space between said probe and said one of said plurality of data points, and “
      
      ²”
      
      is an estimate of noise variance associated with said plurality of data points.
  - 15. The method of claim 11, further comprising minimizing a sum of said potential functions applied to each of said at least a portion of said plurality of data points.
  - 16. The method of claim 15, wherein said minimizing a sum of said potential functions comprises minimizing a sum of said potential functions applied to each of said plurality of data points.

17. A method for dynamically identifying a number of clusters of related data from a plurality of data records each having a plurality of data fields, the data represented as N data points in an M-dimensional space where M is less than or equal to a number of the plurality of data fields and N is less than or equal to a number of the plurality of data records, the method comprising:
- initializing a current position of a data probe as a first position in the M-dimensional space;
  
  determining a new position for said data probe in the M-dimensional space based on a similarity between said data probe as indicated by said current position and at least a portion of the N data points in the M-dimensional space;
  
  adjusting said current position of said data probe to said new position;
  
  repeating said determining a new position and said adjusting said current position until said new position and said current position are approximately a same position; and
  
  once said new position and said current position are approximately said same position, incrementing a count of the number of clusters of related data.
- View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25)
- - 18. The method of claim 17, further comprising:
    - reinitializing a current position of said data probe as a second position in the M-dimensional space, said second position different from said first position.
  - 19. The method of claim 18, further comprising:
    - repeating said determining a new position and said adjusting said current position until said new position and said current position are approximately a second same position.
  - 20. The method of claim 19, further comprising:
    - if said second same position is a unique same position, then incrementing said count of the number of clusters of related data.
  - 21. The method of claim 18, wherein said reinitializing a current position of said data probe as a second position comprises selecting said second position as one of said plurality of data points.
  - 22. The method of claim 18, wherein said reinitializing a current position of said data probe as a second position comprises selecting said second position from one of said N data points.
  - 23. The method of claim 18, wherein said reinitializing a current position of said data probe as a second position comprises selecting said second position from outside a previous probe trajectory.
  - 24. The method of claim 17, wherein said determining a new position for said data probe in said M-dimensional space comprises determining a relative distance between said data probe and one of the N data points.
  - 25. The method of claim 17, wherein said determining a new position for said data probe in said M-dimensional space comprises determining a relative distance between aid data probe and each of said at least a portion of the N data points.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Agilent Technologies, Inc.
Original Assignee
Agilent Technologies, Inc.
Inventors
Minor, James M.

Application Number

US09/986,746
Publication Number

US 20030093411A1
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/35 Clustering; Classification

G06F 18/2321 using statistics or functio...

System and method for dynamic data clustering

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

34 Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for dynamic data clustering

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

34 Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links