System and method for evolutionary clustering of sequential data sets

US 8,930,365 B2
Filed: 04/29/2006
Issued: 01/06/2015
Est. Priority Date: 04/29/2006
Status: Active Grant

First Claim

Patent Images

1. A computer system for clustering a data set in a sequence of data sets, comprising:

a processor device performing computer-executable instructions comprising;

receiving a data set as part of a sequence of data sets in a series of clusterings, said data set having a plurality of data elements and each of the data sets in the sequence being acquired at different timesteps;

determining a first cost of clustering the data set;

wherein the first cost comprises a cost of clustering the data set independently of the series of clusterings of the data sets in the sequence, each of the data sets being acquired at different timesteps;

determining a second cost of clustering the data set;

wherein the second cost comprises a cost of clustering the data set as part of the series of clusterings of the data sets in the sequence;

combining the first cost with the second cost at each timestep;

determining an overall cost of clustering the data set as a sum of the first cost and the second cost, using a selected clustering method;

minimizing the overall cost; and

clustering the data set using the selected clustering method according to the minimized overall cost, such that the clustering at any time has high accuracy while also ensuring that said clustering does not change dramatically from one timestep to a next timestep.

View all claims

9 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An improved system and method for evolutionary clustering of sequential data sets is provided. A snapshot cost may be determined for representing the data set for a particular clustering method used and may determine the cost of clustering the data set independently of a series of clusterings of the data sets in the sequence. A history cost may also be determined for measuring the distance between corresponding clusters of the data set and the previous data set in the sequence of data sets to determine a cost of clustering the data set as part of a series of clusterings of the data sets in the sequence. An overall cost may be determined for clustering the data set by minimizing the combination of the snapshot cost and the history cost. Any clustering method may be used, including flat clustering and hierarchical clustering.

17 Citations

View as Search Results

15 Claims

1. A computer system for clustering a data set in a sequence of data sets, comprising:
- a processor device performing computer-executable instructions comprising;
  
  receiving a data set as part of a sequence of data sets in a series of clusterings, said data set having a plurality of data elements and each of the data sets in the sequence being acquired at different timesteps;
  
  determining a first cost of clustering the data set;
  
  wherein the first cost comprises a cost of clustering the data set independently of the series of clusterings of the data sets in the sequence, each of the data sets being acquired at different timesteps;
  
  determining a second cost of clustering the data set;
  
  wherein the second cost comprises a cost of clustering the data set as part of the series of clusterings of the data sets in the sequence;
  
  combining the first cost with the second cost at each timestep;
  
  determining an overall cost of clustering the data set as a sum of the first cost and the second cost, using a selected clustering method;
  
  minimizing the overall cost; and
  
  clustering the data set using the selected clustering method according to the minimized overall cost, such that the clustering at any time has high accuracy while also ensuring that said clustering does not change dramatically from one timestep to a next timestep.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The system of claim 1 wherein clustering the data set comprises using a flat clustering of points in a vector space.
  - 3. The system of claim 1 wherein clustering the data set comprises using hierarchical clustering.
  - 4. The system of claim 2 wherein determining the first cost comprises determining a cost using flat clustering to cluster the data set independently of the series of clusterings of the data sets in the sequence.
  - 5. The system of claim 2 wherein determining the second cost comprises determining a cost of using flat clustering to cluster the data set as part of the series of clusterings of the data sets in the sequence.
  - 6. The system of claim 3 wherein determining the first cost comprises determining a cost of using hierarchical clustering to cluster the data set independently of the series of clusterings of the data sets in the sequence.
  - 7. The system of claim 3 wherein determining the second cost comprises determining a cost using hierarchical clustering to cluster the data set as part of the series of clusterings of the data sets in the sequence.
  - 8. The system of claim 1 wherein determining the overall cost comprises determining a cost of clustering the data set by minimizing both the first cost of using flat clustering to cluster the data set independently of the series of clusterings of the data sets in the sequence and the second cost of using flat clustering to cluster the data set as part of the series of clusterings of the data sets in the sequence.
  - 9. The system of claim 1 wherein determining the overall cost comprises determining a cost of clustering the data set using hierarchical clustering to cluster the data set.

10. A computer-implemented method for clustering a data set, comprising:
- determining a first cost of clustering a data set;
  
  wherein the first cost comprises a cost of clustering the data set in a sequence of data sets independently of a series of clusterings of the data sets in the sequence, the data set having a plurality of data elements and each of the data sets in the sequence of data sets being acquired at different timesteps;
  
  determining a second cost of clustering the data set;
  
  wherein the second cost comprises a cost of clustering the data set as part of the sequence of clustered data sets;
  
  combining the first cost with the second cost at each timestep;
  
  determining an overall cost of clustering the data set in the sequence of data sets as a sum of the first cost and the second cost, using a selected clustering method;
  
  minimizing the overall cost; and
  
  clustering the sequence of data sets using the selected clustering method according to the minimized overall cost, such that the clustering at any time has high accuracy while also ensuring that said clustering does not change dramatically from one timestep to a next timestep.
- View Dependent Claims (11, 12, 13, 14, 15)
- - 11. The method of claim 10 wherein providing the optimal clustering sequence further comprises applying a greedy heuristic algorithm to minimize a distance between corresponding clusters of the data set and a previous data set in the sequence of data sets.
  - 12. The method of claim 10 wherein determining the first cost of clustering the data set comprises determining a cost of representing the data set for the clustering method used.
  - 13. The method of claim 10 wherein determining the second cost of clustering the data set comprises determining a measure of a distance between corresponding clusters of the data set and a previous data set in the sequence of data sets.
  - 14. The method of claim 10 wherein determining the overall cost of clustering the data set further comprises minimizing a combination of a cost of representing the data set for a particular clustering method used and a measure of a distance between corresponding clusters of the data set and a previous data set in the sequence of data sets.
  - 15. A computer-readable storage medium having computer-executable instructions for performing the method of claim 10.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
R2 Solutions LLC (Acacia Research Corporation)
Original Assignee
Yahoo! Inc. (Apollo Global Management, Inc.)
Inventors
Chakrabarti, Deepayan, Ravikumar, Shanmugasundaram, Tomkins, Andrew
Primary Examiner(s)
Lewis, Cheryl
Assistant Examiner(s)
Wong, Huen

Application Number

US11/414,448
Publication Number

US 20070255737A1
Time in Patent Office

3,174 Days
Field of Search

707/737
US Class Current

707/737
CPC Class Codes

G06F 16/35 Clustering; Classification

G06F 18/23 Clustering techniques

System and method for evolutionary clustering of sequential data sets

First Claim

9 Assignments

0 Petitions

Accused Products

Abstract

17 Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for evolutionary clustering of sequential data sets

First Claim

9 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

17 Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links