Optimal taxonomy layer selection method
First Claim
1. A computerized method of representing a dataset with an optimal layer of a taxonomy, comprising:
- obtaining a taxonomy including a hierarchical arrangement of layers, wherein each layer represents a cluster set containing at least one cluster, wherein each cluster represents at least one record within a dataset;
identifying a range of taxonomy layers based on a measure of intra-cluster homogeneity of each cluster within the taxonomy;
selecting a taxonomy layer within the identified range as an optimal layer of the taxonomy, the selecting being based on a measure of inter-cluster heterogeneity between clusters of a taxonomy layer within the identified range; and
labeling clusters within the optimal layer of the taxonomy;
wherein selecting a taxonomy layer within the identified range as an optimal layer of the taxonomy comprises;
marking all taxonomy layers within the identified range having a measure of inter-cluster heterogeneity that satisfies a first predetermined condition;
selecting a marked taxonomy layer closest to a lower bound of the identified range; and
wherein the computerized method of representing a dataset with a taxonomy occurs within a physical computer.
1 Assignment
0 Petitions
Accused Products
Abstract
A computerized method of representing a dataset with an optimal layer of a taxonomy includes obtaining a taxonomy including a hierarchical arrangement of layers, wherein each layer represents a cluster set containing at least one cluster, wherein each cluster represents at least one record within a dataset; identifying a range of taxonomy layers based on a measure of intra-cluster homogeneity of each cluster within the taxonomy; selecting a taxonomy layer within the identified range as an optimal layer of the taxonomy, the selecting being based on a measure of inter-cluster heterogeneity between clusters of a taxonomy layer within the identified range; and labeling clusters within the optimal layer of the taxonomy.
15 Citations
18 Claims
-
1. A computerized method of representing a dataset with an optimal layer of a taxonomy, comprising:
-
obtaining a taxonomy including a hierarchical arrangement of layers, wherein each layer represents a cluster set containing at least one cluster, wherein each cluster represents at least one record within a dataset; identifying a range of taxonomy layers based on a measure of intra-cluster homogeneity of each cluster within the taxonomy; selecting a taxonomy layer within the identified range as an optimal layer of the taxonomy, the selecting being based on a measure of inter-cluster heterogeneity between clusters of a taxonomy layer within the identified range; and labeling clusters within the optimal layer of the taxonomy; wherein selecting a taxonomy layer within the identified range as an optimal layer of the taxonomy comprises; marking all taxonomy layers within the identified range having a measure of inter-cluster heterogeneity that satisfies a first predetermined condition; selecting a marked taxonomy layer closest to a lower bound of the identified range; and wherein the computerized method of representing a dataset with a taxonomy occurs within a physical computer. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer program product comprising a physical computer usable medium having computer readable code embodied therein for causing a physical computer to effect:
-
obtaining a taxonomy including a hierarchical arrangement of layers, wherein each layer represents a cluster set containing at least one cluster, wherein each cluster represents at least one record within a dataset; identifying a range of taxonomy layers based on a measure of intra-cluster homogeneity of each cluster within the taxonomy; selecting a taxonomy layer within the identified range as an optimal layer of the taxonomy, the selecting being based on a measure of inter-cluster heterogeneity between clusters of a taxonomy layer within the identified range; and labeling clusters within the optimal layer of the taxonomy; wherein the computer usable medium has computer readable code embodied therein for causing a computer to effect selecting a taxonomy layer within the identified range as an optimal layer of the taxonomy by; marking all taxonomy layers within the identified range having a measure of inter-cluster heterogeneity that satisfies a first predetermined condition; and selecting a marked taxonomy layer closest to a lower bound of the identified range. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
Specification