Method and System for Seed Based Clustering of Categorical Data
First Claim
Patent Images
1. A computerized method of representing a dataset with a taxonomy, comprising:
- augmenting a dataset containing a plurality of records with a plurality of predetermined exemplars;
representing the plurality of records and predetermined exemplars within the augmented dataset as a plurality of clusters in an initial taxonomy layer;
generating a truncated hierarchy of cluster sets based on clusters within the initial taxonomy layer, wherein clusters within the truncated hierarchy contain no more than a predetermined number of exemplars; and
labeling clusters within the truncated hierarchy.
3 Assignments
0 Petitions
Accused Products
Abstract
A computerized method of representing a dataset with a taxonomy includes augmenting a dataset containing a plurality of records with a plurality of predetermined exemplars; representing the plurality of records and predetermined exemplars within the augmented dataset as a plurality of clusters in an initial taxonomy layer; generating a truncated hierarchy of cluster sets based on clusters within the initial taxonomy layer, wherein clusters within the truncated hierarchy contain no more than a predetermined number of exemplars; and labeling clusters within the truncated hierarchy.
103 Citations
18 Claims
-
1. A computerized method of representing a dataset with a taxonomy, comprising:
-
augmenting a dataset containing a plurality of records with a plurality of predetermined exemplars; representing the plurality of records and predetermined exemplars within the augmented dataset as a plurality of clusters in an initial taxonomy layer; generating a truncated hierarchy of cluster sets based on clusters within the initial taxonomy layer, wherein clusters within the truncated hierarchy contain no more than a predetermined number of exemplars; and labeling clusters within the truncated hierarchy. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer program product comprising a computer usable medium having computer readable code embodied therein for causing a computer to effect:
-
augmenting a dataset containing a plurality of records with a plurality of predetermined exemplars; representing the plurality of records and predetermined exemplars within the augmented dataset as a plurality of clusters in an initial taxonomy layer; generating a truncated hierarchy of cluster sets based on clusters within the initial taxonomy layer, wherein clusters within the truncated hierarchy contain no more than a predetermined number of exemplars; and labeling clusters within the truncated hierarchy. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
Specification