Data clustering system and methods

US 9,183,285 B1
Filed: 08/27/2014
Issued: 11/10/2015
Est. Priority Date: 08/27/2014
Status: Active Grant

First Claim

Patent Images

1. One or more computer-readable storage media device storing computer-readable instructions that, when executed, instruct one or more processors to perform operations comprising:

providing recommended groupings of clustered data based at least in part on clustering data of a first data set;

receiving an indication from a user that a first portion of the first data set is associated with a bucket, the indication based at least in part on an evaluation by the user of at least one of the recommended groupings;

generating a classification model based at least in part on the indication, one or more data signatures based at least in part on one or more of units of data, input patterns of data, order and proximity of terms, or combinations thereof, and one or more bucket patterns based at least in part on one or more cluster patterns, cluster signatures, input data patterns, or a combination thereof;

generating classified data based at least in part on applying the classification model to a second data set based at least in part on comparing a data signature of the one or more data signatures to a bucket pattern of the one or more bucket patterns;

identifying a subset of data of the first data set, of the second data set, or a combination thereof;

providing another recommended groupings of clustered data based at least in part on clustering data of the subset of data;

receiving another indication from a user that a first portion of the subset of data is associated with another bucket, the another indication based at least in part on an another evaluation by the user of at least one of the another recommended groupings;

generating another classification model based at least in part on the another indication; and

generating another classified data based at least in part on applying the another classification model to a third data set.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Data having some similarities and some dissimilarities may be clustered or grouped according to the similarities and dissimilarities. The data may be clustered using agglomerative clustering techniques. The clusters may be used as suggestions for generating groups where a user may demonstrate certain criteria for grouping. The system may learn from the criteria and extrapolate the groupings to readily sort data into appropriate groups. The system may be easily refined as the user gains an understanding of the data.

27 Citations

View as Search Results

16 Claims

1. One or more computer-readable storage media device storing computer-readable instructions that, when executed, instruct one or more processors to perform operations comprising:
- providing recommended groupings of clustered data based at least in part on clustering data of a first data set;
  
  receiving an indication from a user that a first portion of the first data set is associated with a bucket, the indication based at least in part on an evaluation by the user of at least one of the recommended groupings;
  
  generating a classification model based at least in part on the indication, one or more data signatures based at least in part on one or more of units of data, input patterns of data, order and proximity of terms, or combinations thereof, and one or more bucket patterns based at least in part on one or more cluster patterns, cluster signatures, input data patterns, or a combination thereof;
  
  generating classified data based at least in part on applying the classification model to a second data set based at least in part on comparing a data signature of the one or more data signatures to a bucket pattern of the one or more bucket patterns;
  
  identifying a subset of data of the first data set, of the second data set, or a combination thereof;
  
  providing another recommended groupings of clustered data based at least in part on clustering data of the subset of data;
  
  receiving another indication from a user that a first portion of the subset of data is associated with another bucket, the another indication based at least in part on an another evaluation by the user of at least one of the another recommended groupings;
  
  generating another classification model based at least in part on the another indication; and
  
  generating another classified data based at least in part on applying the another classification model to a third data set.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The one or more computer-readable storage media device of claim 1, wherein at least a portion of the classified data is associated with the bucket.
  - 3. The one or more computer-readable storage media device of claim 1, wherein the indication comprises a selection of one or more inputs affirmatively associated with the bucket.
  - 4. The one or more computer-readable storage media device of claim 1, wherein the indication comprises a selection of one or more subunits of one or more inputs affirmatively associated with the bucket.
  - 5. The one or more computer-readable storage media device of claim 4, wherein the selection of one or more subunits of the one or more inputs comprises a pattern identified in the one or more inputs.

6. One or more computer-readable storage media device storing computer-readable instructions that, when executed, instruct one or more processors to perform operations comprising:
- loading a plurality of classification models;
  
  applying the plurality of classification models to a data set;
  
  comparing a classification recommendation, the classification recommendation based at least in part on the plurality of classification models;
  
  displaying the classification recommendation, the classification recommendation comprising an input, a first suggested classification, based at least in part on a first result from a first model of the plurality of classification models, and a second suggested classification, based at least in part on a second result from a second model of the plurality of classification models;
  
  receiving an indication from a user that the first suggested classification is a correct classification of the input, the indication comprising a selection of one or more subunits of one or more inputs affirmatively associated with the classification recommendation, the selection of one or more subunits of the one or more inputs comprising a pattern identified in the one or more inputs; and
  
  generating a classification model based at least in part on the indication from the user.
- View Dependent Claims (7, 8)
- - 7. The one or more computer-readable storage media device of claim 6, the classification recommendation further comprising a first confidence associated with the first suggested classification, and a second confidence associated with the second suggested classification.
  - 8. The one or more computer-readable storage media device of claim 6, the indication based at least in part on an evaluation by the user of at least a portion of the classification recommendation.

9. A method comprising:
- providing recommended groupings of clustered data based at least in part on clustering data of a first data set;
  
  receiving an indication from a user that a first portion of the first data set is associated with a bucket, the indication based at least in part on an evaluation by the user of at least one of the recommended groupings;
  
  generating a classification model based at least in part on the indication, one or more data signatures based at least in part on one or more of units of data, input patterns of data, order and proximity of terms, or combinations thereof, and one or more bucket patterns based at least in part on one or more cluster patterns, cluster signatures, input data patterns, or a combination thereof;
  
  generating classified data based at least in part on applying the classification model to a second data set based at least in part on comparing a data signature of the one or more data signatures to a bucket pattern of the one or more bucket patterns;
  
  identifying a subset of data of the first data set, of the second data set, or a combination thereof;
  
  providing another recommended groupings of clustered data based at least in part on clustering data of the subset of data;
  
  receiving another indication from a user that a first portion of the subset of data is associated with another bucket, the another indication based at least in part on an another evaluation by the user of at least one of the another recommended groupings;
  
  generating another classification model based at least in part on the another indication; and
  
  generating another classified data based at least in part on applying the another classification model to a third data set.
- View Dependent Claims (10, 11, 12, 13)
- - 10. The method of claim 9, wherein at least a portion of the classified data is associated with the bucket.
  - 11. The method of claim 9, wherein the indication comprises a selection of one or more inputs affirmatively associated with the bucket.
  - 12. The method of claim 9, wherein the indication comprises a selection of one or more subunits of one or more inputs affirmatively associated with the bucket.
  - 13. The method of claim 12, wherein the selection of one or more subunits of the one or more inputs comprises a pattern identified in the one or more inputs.

14. A method comprising:
- loading a plurality of classification models;
  
  applying the plurality of classification models to a data set;
  
  comparing a classification recommendation, the classification recommendation based at least in part on the plurality of classification models;
  
  displaying the classification recommendation, the classification recommendation comprising an input, a first suggested classification, based at least in part on a first result from a first model of the plurality of classification models, and a second suggested classification, based at least in part on a second result from a second model of the plurality of classification models;
  
  receiving an indication from a user that the first suggested classification is a correct classification of the input, the indication comprising a selection of one or more subunits of one or more inputs affirmatively associated with the classification recommendation, the selection of one or more subunits of the one or more inputs comprising a pattern identified in the one or more inputs; and
  
  generating a classification model based at least in part on the indication from the user.
- View Dependent Claims (15, 16)
- - 15. The method of claim 14, the classification recommendation further comprising a first confidence associated with the first suggested classification, and a second confidence associated with the second suggested classification.
  - 16. The method of claim 14, the indication based at least in part on an evaluation by the user of at least a portion of the classification recommendation.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Verint Americas Incorporated (Verint Systems Incorporated)
Original Assignee
Next IT Corporation (Verint Systems Incorporated)
Inventors
Miller, Tanya M, Brown, Megan, Brown, Fred A, Wooters, Charles C, Brown, Molly Q
Primary Examiner(s)
Perveen, Rehana
Assistant Examiner(s)
Khong, Alexander

Application Number

US14/470,856
Time in Patent Office

440 Days
Field of Search

707/740
US Class Current

1/1
CPC Class Codes

G06F 16/285   Clustering or classification

G06F 16/35   Clustering; Classification

G06F 16/355   Class or cluster creation o...

Data clustering system and methods

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

27 Citations

16 Claims

Specification

Use Cases

Quick Links

Others

Data clustering system and methods

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

27 Citations

16 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others