CROSS-DOMAIN CLUSTERABILITY EVALUATION FOR CROSS-GUIDED DATA CLUSTERING BASED ON ALIGNMENT BETWEEN DATA DOMAINS
First Claim
1. A method for evaluating cross-domain clusterability upon a target domain and a source domain, said method comprising:
- a processor of a computer system receiving the source domain and the target domain, wherein the source domain comprises at least one source data item and the target domain comprises at least one target data item;
said processor calculating target clusterability as an average of a respective clusterability of said at least one target data item such that the target clusterability quantifies how clusterable the target domain is, wherein the respective clusterability of a target data item of said at least one target data item quantifies how unambiguously the target data item can be assigned to a respective true target centroid associated with the target data item;
said processor calculating target-side matchability as an average of a respective matchability of each target centroid of the target domain to source centroids of the source domain such that the target-side matchability quantifies how well target centroids of the target domain are aligned with the source centroids;
said processor calculating source-side matchability as an average of a respective matchability of each source centroid of said source centroids to the target centroids such that the source-side matchability quantifies how well the source centroids are aligned with the target centroids;
said processor calculating source-target pair matchability as an average of the target-side matchability and the source-side matchability;
said processor calculating cross-domain clusterability between the target domain and the source domain as a linear combination of the calculated target clusterability and the calculated source-target pair matchability by use of a trade-off parameter that indicates relative contribution of the target clusterability and the source-target pair matchability to the cross-domain clusterability; and
said processor transferring the calculated cross-domain clusterability to a device selected from an output device of the computer system, a storage device of the computer system, a remote computer system coupled to the computer system, and a combination thereof.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and associated method for evaluating cross-domain clusterability upon a target domain and a source domain. The cross-domain clusterability is calculated as a linear combination of a target clusterability and a source-target pair matchability, by use of a trade-off parameter that determines relative contribution of the target clusterability and the source-target pair matchability. The target clusterability quantifies how clusterable the target domain is. The source-target pair matchability is calculated as an average of a target-side matchability and a source-side matchability, which quantifies how well target centroids of the target domain are aligned with the source centroids and how well source centroids of the source domain are aligned with the target centroids, respectively.
52 Citations
20 Claims
-
1. A method for evaluating cross-domain clusterability upon a target domain and a source domain, said method comprising:
-
a processor of a computer system receiving the source domain and the target domain, wherein the source domain comprises at least one source data item and the target domain comprises at least one target data item; said processor calculating target clusterability as an average of a respective clusterability of said at least one target data item such that the target clusterability quantifies how clusterable the target domain is, wherein the respective clusterability of a target data item of said at least one target data item quantifies how unambiguously the target data item can be assigned to a respective true target centroid associated with the target data item; said processor calculating target-side matchability as an average of a respective matchability of each target centroid of the target domain to source centroids of the source domain such that the target-side matchability quantifies how well target centroids of the target domain are aligned with the source centroids; said processor calculating source-side matchability as an average of a respective matchability of each source centroid of said source centroids to the target centroids such that the source-side matchability quantifies how well the source centroids are aligned with the target centroids; said processor calculating source-target pair matchability as an average of the target-side matchability and the source-side matchability; said processor calculating cross-domain clusterability between the target domain and the source domain as a linear combination of the calculated target clusterability and the calculated source-target pair matchability by use of a trade-off parameter that indicates relative contribution of the target clusterability and the source-target pair matchability to the cross-domain clusterability; and said processor transferring the calculated cross-domain clusterability to a device selected from an output device of the computer system, a storage device of the computer system, a remote computer system coupled to the computer system, and a combination thereof. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer program product comprising:
-
a computer readable storage medium having a computer readable program code embodied therein, said computer readable program code containing instructions that perform a method for evaluating cross-domain clusterability upon a target domain and a source domain, said method comprising; receiving the source domain and the target domain, wherein the source domain comprises at least one source data item and the target domain comprises at least one target data item; calculating target clusterability as an average of a respective clusterability of said at least one target data item such that the target clusterability quantifies how clusterable the target domain is, wherein the respective clusterability of a target data item of said at least one target data item quantifies how unambiguously the target data item can be assigned to a respective true target centroid associated with the target data item; calculating target-side matchability as an average of a respective matchability of each target centroid of the target domain to source centroids of the source domain such that the target-side matchability quantifies how well target centroids of the target domain are aligned with the source centroids; calculating source-side matchability as an average of a respective matchability of each source centroid of said source centroids to the target centroids such that the source-side matchability quantifies how well the source centroids are aligned with the target centroids; calculating source-target pair matchability as an average of the target-side matchability and the source-side matchability; calculating cross-domain clusterability between the target domain and the source domain as a linear combination of the calculated target clusterability and the calculated source-target pair matchability by use of a trade-off parameter that indicates relative contribution of the target clusterability and the source-target pair matchability to the cross-domain clusterability; and transferring the calculated cross-domain clusterability to a device selected from an output device of a computer system, a storage device of the computer system, a remote computer system coupled to the computer system, and a combination thereof. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A computer system comprising a processor and a computer readable memory unit coupled to the processor, said computer readable memory unit containing instructions that when run by the processor implement a method for evaluating cross-domain clusterability upon a target domain and a source domain, said method comprising:
-
receiving the source domain and the target domain, wherein the source domain comprises at least one source data item and the target domain comprises at least one target data item; calculating target clusterability as an average of a respective clusterability of said at least one target data item such that the target clusterability quantifies how clusterable the target domain is, wherein the respective clusterability of a target data item of said at least one target data item quantifies how unambiguously the target data item can be assigned to a respective true target centroid associated with the target data item; calculating target-side matchability as an average of a respective matchability of each target centroid of the target domain to source centroids of the source domain such that the target-side matchability quantifies how well target centroids of the target domain are aligned with the source centroids; calculating source-side matchability as an average of a respective matchability of each source centroid of said source centroids to the target centroids such that the source-side matchability quantifies how well the source centroids are aligned with the target centroids; calculating source-target pair matchability as an average of the target-side matchability and the source-side matchability; calculating cross-domain clusterability between the target domain and the source domain as a linear combination of the calculated target clusterability and the calculated source-target pair matchability by use of a trade-off parameter that indicates relative contribution of the target clusterability and the source-target pair matchability to the cross-domain clusterability; and transferring the calculated cross-domain clusterability to a device selected from an output device of the computer system, a storage device of the computer system, a remote computer system coupled to the computer system, and a combination thereof. - View Dependent Claims (12, 13, 14, 15)
-
-
16. A process for supporting computer infrastructure, said process comprising providing at least one support service for at least one of creating, integrating, hosting, maintaining, and deploying computer-readable code in a computing system, wherein the code in combination with the computing system is capable of performing a method for evaluating cross-domain clusterability upon a target domain and a source domain, said method comprising:
-
receiving the source domain and the target domain, wherein the source domain comprises at least one source data item and the target domain comprises at least one target data item; calculating target clusterability as an average of a respective clusterability of said at least one target data item such that the target clusterability quantifies how clusterable the target domain is, wherein the respective clusterability of a target data item of said at least one target data item quantifies how unambiguously the target data item can be assigned to a respective true target centroid associated with the target data item; calculating target-side matchability as an average of a respective matchability of each target centroid of the target domain to source centroids of the source domain such that the target-side matchability quantifies how well target centroids of the target domain are aligned with the source centroids; calculating source-side matchability as an average of a respective matchability of each source centroid of said source centroids to the target centroids such that the source-side matchability quantifies how well the source centroids are aligned with the target centroids; calculating source-target pair matchability as an average of the target-side matchability and the source-side matchability; calculating cross-domain clusterability between the target domain and the source domain as a linear combination of the calculated target clusterability and the calculated source-target pair matchability by use of a trade-off parameter that indicates relative contribution of the target clusterability and the source-target pair matchability to the cross-domain clusterability; and transferring the calculated cross-domain clusterability to a device selected from an output device of the computer system, a storage device of the computer system, a remote computer system coupled to the computer system, and a combination thereof. - View Dependent Claims (17, 18, 19, 20)
-
Specification