DATA CLASSIFICATION AND HIERARCHICAL CLUSTERING
First Claim
1. A computer-implemented method comprising:
- using a computer comprising a processor to perform;
initializing a model, the model including a plurality of classes;
selecting subsets of patterns from a set of available patterns in a training instance selected from a training set of training instances, the selecting subsets including selecting a subset of size-1 patterns and selecting a subset of size-2 patterns;
initializing a weight of each size-1 pattern in the subset of size-1 patterns;
including each size-1 pattern in the subset of size-1 patterns in each class in the plurality of classes in the model;
calculating an overall significance value of each size-2 pattern in the training instance;
sorting the size-2 patterns using the overall significance;
selecting the highest k sorted size-2 patterns;
initializing a weight of each selected highest k size-2 pattern;
adjusting the weights on the size-1 and size-2 patterns; and
presenting the model organized with the plurality of classes, each class including the size-1 patterns, the highest k size-2 patterns, and the weights of the size-1 and size-2 patterns.
1 Assignment
0 Petitions
Accused Products
Abstract
Apparatus, systems, and methods can operate to provide efficient data clustering, data classification, and data compression. A method comprises training set of training instances can be processed to select a subset of size-1 patterns, initialize a weight of each size-1 pattern, include the size-1 patterns in classes in a model associated with the training set, and then include a set of top-k size-2 patterns in a way that provides an effective balance between local, class, and global significance patterns. A method comprises processing a dataset to compute an overall significance value of each size-2 pattern in each instance in the dataset, sort the size-2 patterns, and select the top-k size-2 patterns to be represented in clusters, which can be refined into a clustered hierarchy. A method comprises creating an uncompressed bitmap, reordering the bitmap, and compressing the bitmap. Additional apparatus, systems, and methods are disclosed.
-
Citations
38 Claims
-
1. A computer-implemented method comprising:
using a computer comprising a processor to perform; initializing a model, the model including a plurality of classes; selecting subsets of patterns from a set of available patterns in a training instance selected from a training set of training instances, the selecting subsets including selecting a subset of size-1 patterns and selecting a subset of size-2 patterns; initializing a weight of each size-1 pattern in the subset of size-1 patterns; including each size-1 pattern in the subset of size-1 patterns in each class in the plurality of classes in the model; calculating an overall significance value of each size-2 pattern in the training instance; sorting the size-2 patterns using the overall significance; selecting the highest k sorted size-2 patterns; initializing a weight of each selected highest k size-2 pattern; adjusting the weights on the size-1 and size-2 patterns; and presenting the model organized with the plurality of classes, each class including the size-1 patterns, the highest k size-2 patterns, and the weights of the size-1 and size-2 patterns. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 10, 11, 12)
-
9. (canceled)
-
13. A system comprising:
-
a memory; and a control module coupled to the memory, the control module comprising; a first initialization module to initialize a model, the model including a plurality of classes; a first selection module to select subsets of patterns from a set of available patterns in a training instance selected from a training set of training instances, the selecting subsets including selecting a subset of size-1 patterns and selecting a subset of size-2 patterns; a second initialization module to initialize a weight of each size-1 pattern in the subset of size-1 patterns; an organization module to include each size-1 pattern in the subset of size-1 patterns in each class in the plurality of classes in the model; a first calculation module to calculate an overall significance value of each size-2 pattern in the training instance; a sorting module to sort the size-2 patterns using the overall significance; a second selection module to select the highest k sorted size-2 patterns; a third initialization module to initialize a weight of each selected highest k size-2 pattern; an adjustment module to adjust the weights on the size-1 and size-2 patterns; and a presentation module to present the model organized with the plurality of classes, each class including the size-1 patterns, the highest k size-2 patterns, and the weights of the size-1 and size-2 patterns. - View Dependent Claims (14, 15)
-
-
16. A machine-readable medium including instructions, which when executed by a machine, cause the machine to perform operations comprising:
-
initializing a model, the model including a plurality of classes; selecting subsets of patterns from a set of available patterns in a training instance selected from a training set of training instances, the selecting subsets including selecting a subset of size-1 patterns and selecting a subset of size-2 patterns; initializing a weight of each size-1 pattern in the subset of size-1 patterns; including each size-1 pattern in the subset of size-1 patterns in each class in the plurality of classes in the model; calculating an overall significance value of each size-2 pattern in the training instance; sorting the size-2 patterns using the overall significance; selecting the highest k sorted size-2 patterns; initializing a weight of each selected highest k size-2 pattern; adjusting the weights on the size-1 and size-2 patterns; and presenting the model organized with the plurality of classes, each class including the size-1 patterns, the highest k size-2 patterns, and the weights of the size-1 and size-2 patterns.
-
-
17. A computer-implemented method comprising:
using a computer comprising a processor to perform; receiving a dataset comprising a plurality of instances, each instance including a plurality of size-2 patterns; for each instance in the dataset; computing an overall significance value of each size-2 pattern in the instance; sorting the plurality of size-2 patterns in the instance based on the overall significance value; selecting the top-k size-2 patterns, the k value being specified; and including the top-k size-2 patterns in a cluster in a set of clusters; and presenting the set of clusters of top-k size-2 patterns. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
-
28. A system comprising:
-
a memory; and a control module coupled to the memory, the control module comprising; a receiving module to receive a dataset comprising a plurality of instances, each instance including a plurality of size-2 patterns; a looping module to loop for each instance in the dataset; a computation module to compute an overall significance value of each size-2 pattern in the instance; a sorting module to sort the plurality of size-2 patterns in the instance based on the overall significance value; a selection module to select the top-k size-2 patterns, the k value being specified; and an organization module to include the top-k size-2 patterns in a cluster in a set of clusters; and a presentation module to present the set of clusters of top-k size-2 patterns. - View Dependent Claims (29, 30)
-
-
31. A machine-readable medium including instructions, which when executed by a machine, cause the machine to perform operations comprising:
-
receiving a dataset comprising a plurality of instances, each instance including a plurality of size-2 patterns; for each instance in the dataset; computing an overall significance value of each size-2 pattern in the instance; sorting the plurality of size-2 patterns in the instance based on the overall significance value; selecting the top-k size-2 patterns, the k value being specified; and including the top-k size-2 patterns in a cluster in a set of clusters; and presenting the set of clusters of top-k size-2 patterns.
-
-
32. A computer-implemented method comprising:
using a computer comprising a processor to perform; creating an uncompressed bitmap representation of each instance in a dataset, each bitmap representation including a plurality of n ordered bits, each bit indicating whether a corresponding item selected from a set of n items is present in the corresponding instance; reordering the bitmap representations of the instances into an order that reduces or minimizes at least approximate Hamming-distances; compressing the reordered bitmap representations; and presenting the compressed reordered bitmap representations. - View Dependent Claims (33, 34)
-
35. A system comprising:
-
a memory; and a control module coupled to the memory, the control module comprising; a creation module to create an uncompressed bitmap representation of each instance in a dataset, each bitmap representation including a plurality of n ordered bits, each bit indicating whether a corresponding item selected from a set of n items is present in the corresponding instance; an organization module to reorder the bitmap representations of the instances into an order that reduces or minimizes at least approximate Hamming-distances; a data compression module compress the reordered bitmap representations; and a presentation module to present the compressed reordered bitmap representations. - View Dependent Claims (36, 37)
-
-
38. A machine-readable medium including instructions, which when executed by a machine, cause the machine to perform operations comprising:
-
creating an uncompressed bitmap representation of each instance in a dataset, each bitmap representation including a plurality of n ordered bits, each bit indicating whether a corresponding item selected from a set of n items is present in the corresponding instance; reordering the bitmap representations of the instances into an order that reduces or minimizes Hamming-distances; compressing the reordered bitmap representations; and presenting the compressed reordered bitmap representations.
-
Specification