Data classification and hierarchical clustering
First Claim
1. A computer-implemented method comprising:
- using a computer comprising a processor to perform;
initializing a model, the model including a plurality of classes;
selecting subsets of patterns from a set of available patterns in a training instance selected from a training set of training instances, the selecting subsets including selecting a subset of size-1 patterns and selecting a subset of size-2 patterns;
initializing a weight of each size-1 pattern in the subset of size-1 patterns;
including each size-1 pattern in the subset of size-1 patterns in each class in the plurality of classes in the model;
calculating an overall significance value of each size-2 pattern in the training instance;
sorting the size-2 patterns using the overall significance;
selecting the highest k sorted size-2 patterns;
initializing a weight of each selected highest k size-2 pattern;
adjusting the weights on the size-1 and size-2 patterns; and
presenting the model organized with the plurality of classes, each class including the size-1 patterns, the highest k size-2 patterns, and the weights of the size-1 and size-2 patterns.
1 Assignment
0 Petitions
Accused Products
Abstract
Apparatus, systems, and methods can operate to provide efficient data clustering, data classification, and data compression. A method comprises training set of training instances can be processed to select a subset of size-1 patterns, initialize a weight of each size-1 pattern, include the size-1 patterns in classes in a model associated with the training set, and then include a set of top-k size-2 patterns in a way that provides an effective balance between local, class, and global significance patterns. A method comprises processing a dataset to compute an overall significance value of each size-2 pattern in each instance in the dataset, sort the size-2 patterns, and select the top-k size-2 patterns to be represented in clusters, which can be refined into a clustered hierarchy. A method comprises creating an uncompressed bitmap, reordering the bitmap, and compressing the bitmap. Additional apparatus, systems, and methods are disclosed.
-
Citations
30 Claims
-
1. A computer-implemented method comprising:
using a computer comprising a processor to perform; initializing a model, the model including a plurality of classes; selecting subsets of patterns from a set of available patterns in a training instance selected from a training set of training instances, the selecting subsets including selecting a subset of size-1 patterns and selecting a subset of size-2 patterns; initializing a weight of each size-1 pattern in the subset of size-1 patterns; including each size-1 pattern in the subset of size-1 patterns in each class in the plurality of classes in the model; calculating an overall significance value of each size-2 pattern in the training instance; sorting the size-2 patterns using the overall significance; selecting the highest k sorted size-2 patterns; initializing a weight of each selected highest k size-2 pattern; adjusting the weights on the size-1 and size-2 patterns; and presenting the model organized with the plurality of classes, each class including the size-1 patterns, the highest k size-2 patterns, and the weights of the size-1 and size-2 patterns. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
12. A system comprising:
-
a memory; and a control module coupled to the memory, the control module comprising; a first initialization module to initialize a model, the model including a plurality of classes; a first selection module to select subsets of patterns from a set of available patterns in a training instance selected from a training set of training instances, the selecting subsets including selecting a subset of size-1 patterns and selecting a subset of size-2 patterns; a second initialization module to initialize a weight of each size-1 pattern in the subset of size-1 patterns; an organization module to include each size-1 pattern in the subset of size-1 patterns in each class in the plurality of classes in the model; a first calculation module to calculate an overall significance value of each size-2 pattern in the training instance; a sorting module to sort the size-2 patterns using the overall significance; a second selection module to select the highest k sorted size-2 patterns; a third initialization module to initialize a weight of each selected highest k size-2 pattern; an adjustment module to adjust the weights on the size-1 and size-2 patterns; and a presentation module to present the model organized with the plurality of classes, each class including the size-1 patterns, the highest k size-2 patterns, and the weights of the size-1 and size-2 patterns. - View Dependent Claims (13, 14)
-
-
15. A non-transitory machine-readable medium including instructions, which when executed by a machine, cause the machine to perform operations comprising:
-
initializing a model, the model including a plurality of classes; selecting subsets of patterns from a set of available patterns in a training instance selected from a training set of training instances, the selecting subsets including selecting a subset of size-1 patterns and selecting a subset of size-2 patterns; initializing a weight of each size-1 pattern in the subset of size-1 patterns; including each size-1 pattern in the subset of size-1 patterns in each class in the plurality of classes in the model; calculating an overall significance value of each size-2 pattern in the training instance; sorting the size-2 patterns using the overall significance; selecting the highest k sorted size-2 patterns; initializing a weight of each selected highest k size-2 pattern; adjusting the weights on the size-1 and size-2 patterns; and presenting the model organized with the plurality of classes, each class including the size-1 patterns, the highest k size-2 patterns, and the weights of the size-1 and size-2 patterns.
-
-
16. A computer-implemented method comprising:
using a computer comprising a processor to perform; receiving a dataset comprising a plurality of instances, each instance including a plurality of size-2 patterns; for each instance in the dataset; computing an overall significance value of each size-2 pattern in the instance; sorting the plurality of size-2 patterns in the instance based on the overall significance value; selecting the top-k size-2 patterns, the k value being specified; and including the top-k size-2 patterns in a cluster in a set of clusters; and presenting the set of clusters of top-k size-2 patterns. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
27. A system comprising:
-
a memory; and a control module coupled to the memory, the control module comprising; a receiving module to receive a dataset comprising a plurality of instances, each instance including a plurality of size-2 patterns; a looping module to loop for each instance in the dataset; a computation module to compute an overall significance value of each size-2 pattern in the instance; a sorting module to sort the plurality of size-2 patterns in the instance based on the overall significance value; a selection module to select the top-k size-2 patterns, the k value being specified; and an organization module to include the top-k size-2 patterns in a cluster in a set of clusters; and a presentation module to present the set of clusters of top-k size-2 patterns. - View Dependent Claims (28, 29)
-
-
30. A non-transitory machine-readable medium including instructions, which when executed by a machine, cause the machine to perform operations comprising:
-
receiving a dataset comprising a plurality of instances, each instance including a plurality of size-2 patterns; for each instance in the dataset; computing an overall significance value of each size-2 pattern in the instance; sorting the plurality of size-2 patterns in the instance based on the overall significance value; selecting the top-k size-2 patterns, the k value being specified; and including the top-k size-2 patterns in a cluster in a set of clusters; and presenting the set of clusters of top-k size-2 patterns.
-
Specification