System and method for evolutionary clustering of sequential data sets

US 20070255737A1
Filed: 04/29/2006
Published: 11/01/2007
Est. Priority Date: 04/29/2006
Status: Active Grant

First Claim

Patent Images

1. A computer system for clustering a data set, comprising:

a clustering engine for clustering at least one data set in a sequence of data sets as part of a series of clusterings of the data sets in the sequence;

a snapshot cost evaluator for determining a snapshot cost of clustering the at least one data set independently of the series of clusterings of the data sets in the sequence;

a history cost evaluator for determining a history cost of clustering the at least one data set as part of the series of clusterings of the data sets in the sequence; and

an overall cost evaluator for determining a cost of clustering the at least one data set by minimizing both the snapshot cost of clustering at least one data set independently of the series of clusterings of the data sets in the sequence and the history cost of clustering the at least one data set as part of the series of clusterings of the data sets in the sequence.

View all claims

9 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An improved system and method for evolutionary clustering of sequential data sets is provided. A snapshot cost may be determined for representing the data set for a particular clustering method used and may determine the cost of clustering the data set independently of a series of clusterings of the data sets in the sequence. A history cost may also be determined for measuring the distance between corresponding clusters of the data set and the previous data set in the sequence of data sets to determine a cost of clustering the data set as part of a series of clusterings of the data sets in the sequence. An overall cost may be determined for clustering the data set by minimizing the combination of the snapshot cost and the history cost. Any clustering method may be used, including flat clustering and hierarchical clustering.

Citations

20 Claims

1. A computer system for clustering a data set, comprising:
- a clustering engine for clustering at least one data set in a sequence of data sets as part of a series of clusterings of the data sets in the sequence;
  
  a snapshot cost evaluator for determining a snapshot cost of clustering the at least one data set independently of the series of clusterings of the data sets in the sequence;
  
  a history cost evaluator for determining a history cost of clustering the at least one data set as part of the series of clusterings of the data sets in the sequence; and
  
  an overall cost evaluator for determining a cost of clustering the at least one data set by minimizing both the snapshot cost of clustering at least one data set independently of the series of clusterings of the data sets in the sequence and the history cost of clustering the at least one data set as part of the series of clusterings of the data sets in the sequence.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The system of claim 1 further comprising a flat clustering engine operably coupled to the clustering engine for clustering the at least one data set using a flat clustering of points in a vector space.
  - 3. The system of claim 1 further comprising a hierarchical clustering engine operably coupled to the clustering engine for clustering the at least one data set using hierarchical clustering.
  - 4. The system of claim 1 wherein the snapshot cost evaluator comprises a snapshot cost evaluator for determining a snapshot cost of using flat clustering to cluster the at least one data set independently of the series of clusterings of the data sets in the sequence.
  - 5. The system of claim 1 wherein the history cost evaluator comprises a history cost evaluator for determining a history cost of using flat clustering to cluster the at least one data set as part of the series of clusterings of the data sets in the sequence.
  - 6. The system of claim 1 wherein the snapshot cost evaluator comprises a snapshot cost evaluator for determining a snapshot cost of using hierarchical clustering to cluster the at least one data set independently of the series of clusterings of the data sets in the sequence.
  - 7. The system of claim 1 wherein the history cost evaluator comprises a history cost evaluator for determining a history cost of using hierarchical clustering to cluster the at least one data set as part of the series of clusterings of the data sets in the sequence.
  - 8. The system of claim 1 wherein the overall cost evaluator comprises an overall cost evaluator for determining a cost of clustering the at least one data set by minimizing both a snapshot cost of using flat clustering to cluster the at least one data set independently of the series of clusterings of the data sets in the sequence and a history cost of using flat clustering to cluster the at least one data set as part of the series of clusterings of the data sets in the sequence.
  - 9. The system of claim 1 wherein the overall cost evaluator comprises an overall cost evaluator for determining a cost of clustering the at least one data set by minimizing both a snapshot cost of using hierarchical clustering to cluster the at least one data set independently of the series of clusterings of the data sets in the sequence and a history cost of using hierarchical clustering to cluster the at least one data set as part of the series of clusterings of the data sets in the sequence.
  - 10. A computer-readable medium having computer-executable components comprising the system of claim 1.

11. A computer-implemented method for clustering a data set, comprising:
- determining an overall cost of clustering at least one data set in a sequence of data sets by minimizing the combination of a snapshot cost of clustering the at least one data set independently of the series of clusterings of the data sets in the sequence and a history cost of clustering the at least one data set as part of the series of clusterings of the data sets in the sequence; and
  
  clustering the at least one data set in the sequence of data sets according to the overall cost determined by minimizing the combination of both the snapshot cost of clustering the at least one data set independently of the series of clusterings of the data sets in the sequence and the history cost of clustering the at least one data set as part of the series of clusterings of the data sets in the sequence.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
- - 12. The method of claim 11 further comprising determining the snapshot cost of clustering the at least one data set in the sequence of data sets independently of a series of clusterings of the data sets in the sequence.
  - 13. The method of claim 11 further comprising determining the history cost of clustering the at least one data set in the sequence of data sets as part of the series of clusterings of the data sets in the sequence.
  - 14. The method of claim 11 wherein the clustering the at least one data set in the sequence of data sets according to the overall cost determined by minimizing the combination of both the snapshot cost and the history cost comprises applying a greedy heuristic to minimize the distance between corresponding clusters of the at least one data set and the previous data set in the sequence of data sets.
  - 15. The method of claim 11 wherein clustering the at least one data set in the sequence of data sets according to the overall cost determined by minimizing both the snapshot cost and the history cost comprises applying a greedy heuristic to minimize the distance between corresponding clusters of the at least one data set and the previous data set in the sequence of data sets.
  - 16. The method of claim 12 wherein determining the snapshot cost of clustering the at least one data set comprises determining a cost of representing the at least one data set for a particular clustering method used.
  - 17. The method of claim 13 wherein determining the history cost of clustering the at least one data set comprises determining a measure of the distance between corresponding clusters of the at least one data set and the previous data set in the sequence of data sets.
  - 18. The method of claim 11 wherein determining the overall cost of clustering the at least one data set comprises minimizing a combination of a cost of representing the at least one data set for a particular clustering method used and a measure of the distance between corresponding clusters of the at least one data set and the previous data set in the sequence of data sets.
  - 19. A computer-readable medium having computer-executable instructions for performing the method of claim 11.

20. A computer system for clustering a data set, comprising:
- means for determining a snapshot cost of clustering at least one data set in a sequence of data sets independently of a series of clusterings of the data sets in the sequence;
  
  means for determining a history cost of clustering the at least one data set in the sequence of data sets as part of the series of clusterings of the data sets in the sequence;
  
  means for determining an overall cost of clustering the at least one data set in the sequence of data sets by minimizing both the snapshot cost of clustering the at least one data set independently of the series of clusterings of the data sets in the sequence and the history cost of clustering the at least one data set as part of the series of clusterings of the data sets in the sequence; and
  
  means for clustering the at least one data set in the sequence of data sets according to the overall cost determined by minimizing both the snapshot cost of clustering the at least one data set independently of the series of clusterings of the data sets in the sequence and the history cost of clustering the at least one data set as part of the series of clusterings of the data sets in the sequence.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
R2 Solutions LLC (Acacia Research Corporation)
Original Assignee
Yahoo! Inc. (Apollo Global Management, Inc.)
Inventors
Ravikumar, Shanmugasundaram, Tomkins, Andrew, Chakrabarti, Deepayan

Granted Patent

US 8,930,365 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/35 Clustering; Classification

G06F 18/23 Clustering techniques

System and method for evolutionary clustering of sequential data sets

First Claim

9 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for evolutionary clustering of sequential data sets

First Claim

9 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links