Data clustering system and methods
First Claim
Patent Images
1. One or more computer-readable storage media device storing computer-readable instructions that, when executed, instruct one or more processors to perform operations comprising:
- providing recommended groupings of clustered data based at least in part on clustering data of a first data set;
receiving an indication from a user that a first portion of the first data set is associated with a bucket, the indication based at least in part on an evaluation by the user of at least one of the recommended groupings;
generating a classification model based at least in part on the indication, one or more data signatures based at least in part on one or more of units of data, input patterns of data, order and proximity of terms, or combinations thereof, and one or more bucket patterns based at least in part on one or more cluster patterns, cluster signatures, input data patterns, or a combination thereof;
generating classified data based at least in part on applying the classification model to a second data set based at least in part on comparing a data signature of the one or more data signatures to a bucket pattern of the one or more bucket patterns;
identifying a subset of data of the first data set, of the second data set, or a combination thereof;
providing another recommended groupings of clustered data based at least in part on clustering data of the subset of data;
receiving another indication from a user that a first portion of the subset of data is associated with another bucket, the another indication based at least in part on an another evaluation by the user of at least one of the another recommended groupings;
generating another classification model based at least in part on the another indication; and
generating another classified data based at least in part on applying the another classification model to a third data set.
3 Assignments
0 Petitions
Accused Products
Abstract
Data having some similarities and some dissimilarities may be clustered or grouped according to the similarities and dissimilarities. The data may be clustered using agglomerative clustering techniques. The clusters may be used as suggestions for generating groups where a user may demonstrate certain criteria for grouping. The system may learn from the criteria and extrapolate the groupings to readily sort data into appropriate groups. The system may be easily refined as the user gains an understanding of the data.
27 Citations
16 Claims
-
1. One or more computer-readable storage media device storing computer-readable instructions that, when executed, instruct one or more processors to perform operations comprising:
-
providing recommended groupings of clustered data based at least in part on clustering data of a first data set; receiving an indication from a user that a first portion of the first data set is associated with a bucket, the indication based at least in part on an evaluation by the user of at least one of the recommended groupings; generating a classification model based at least in part on the indication, one or more data signatures based at least in part on one or more of units of data, input patterns of data, order and proximity of terms, or combinations thereof, and one or more bucket patterns based at least in part on one or more cluster patterns, cluster signatures, input data patterns, or a combination thereof; generating classified data based at least in part on applying the classification model to a second data set based at least in part on comparing a data signature of the one or more data signatures to a bucket pattern of the one or more bucket patterns; identifying a subset of data of the first data set, of the second data set, or a combination thereof; providing another recommended groupings of clustered data based at least in part on clustering data of the subset of data; receiving another indication from a user that a first portion of the subset of data is associated with another bucket, the another indication based at least in part on an another evaluation by the user of at least one of the another recommended groupings; generating another classification model based at least in part on the another indication; and generating another classified data based at least in part on applying the another classification model to a third data set. - View Dependent Claims (2, 3, 4, 5)
-
-
6. One or more computer-readable storage media device storing computer-readable instructions that, when executed, instruct one or more processors to perform operations comprising:
-
loading a plurality of classification models; applying the plurality of classification models to a data set; comparing a classification recommendation, the classification recommendation based at least in part on the plurality of classification models; displaying the classification recommendation, the classification recommendation comprising an input, a first suggested classification, based at least in part on a first result from a first model of the plurality of classification models, and a second suggested classification, based at least in part on a second result from a second model of the plurality of classification models; receiving an indication from a user that the first suggested classification is a correct classification of the input, the indication comprising a selection of one or more subunits of one or more inputs affirmatively associated with the classification recommendation, the selection of one or more subunits of the one or more inputs comprising a pattern identified in the one or more inputs; and generating a classification model based at least in part on the indication from the user. - View Dependent Claims (7, 8)
-
-
9. A method comprising:
-
providing recommended groupings of clustered data based at least in part on clustering data of a first data set; receiving an indication from a user that a first portion of the first data set is associated with a bucket, the indication based at least in part on an evaluation by the user of at least one of the recommended groupings; generating a classification model based at least in part on the indication, one or more data signatures based at least in part on one or more of units of data, input patterns of data, order and proximity of terms, or combinations thereof, and one or more bucket patterns based at least in part on one or more cluster patterns, cluster signatures, input data patterns, or a combination thereof; generating classified data based at least in part on applying the classification model to a second data set based at least in part on comparing a data signature of the one or more data signatures to a bucket pattern of the one or more bucket patterns; identifying a subset of data of the first data set, of the second data set, or a combination thereof; providing another recommended groupings of clustered data based at least in part on clustering data of the subset of data; receiving another indication from a user that a first portion of the subset of data is associated with another bucket, the another indication based at least in part on an another evaluation by the user of at least one of the another recommended groupings; generating another classification model based at least in part on the another indication; and generating another classified data based at least in part on applying the another classification model to a third data set. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A method comprising:
-
loading a plurality of classification models; applying the plurality of classification models to a data set; comparing a classification recommendation, the classification recommendation based at least in part on the plurality of classification models; displaying the classification recommendation, the classification recommendation comprising an input, a first suggested classification, based at least in part on a first result from a first model of the plurality of classification models, and a second suggested classification, based at least in part on a second result from a second model of the plurality of classification models; receiving an indication from a user that the first suggested classification is a correct classification of the input, the indication comprising a selection of one or more subunits of one or more inputs affirmatively associated with the classification recommendation, the selection of one or more subunits of the one or more inputs comprising a pattern identified in the one or more inputs; and generating a classification model based at least in part on the indication from the user. - View Dependent Claims (15, 16)
-
Specification