ACCELERATED DISCRETE DISTRIBUTION CLUSTERING UNDER WASSERSTEIN DISTANCE
First Claim
1. A method of clustering complex data objects, comprising the steps of:
- a) performing an initial segmentation of the data objects;
b) performing a series of discrete distribution (D2) clustering operations on the data objects using a scalable method to optimize a set of Wassersrtein centroids within each segment;
c) combining the centroids determined in step b) into one data set and performing a segmentation of this data set;
d) iteratively repeating steps b) and c) at higher levels in a hierarchy, if necessary, until a single segmentation is achieved, the number of centroids is reduced to an acceptable level, or another stopping criterion is satisfied; and
wherein the D2 clustering operations are performed by parallel processors or a single processor in sequence.
1 Assignment
0 Petitions
Accused Products
Abstract
Computationally efficient accelerated D2-clustering algorithms are disclosed for clustering discrete distributions under the Wasserstein distance with improved scalability. Three first-order methods include subgradient descent method with re-parametrization, alternating direction method of multipliers (ADMM), and a modified version of Bregman ADMM. The effects of the hyper-parameters on robustness, convergence, and speed of optimization are thoroughly examined. A parallel algorithm for the modified Bregman ADMM method is tested in a multi-core environment with adequate scaling efficiency subject to hundreds of CPUs, demonstrating the effectiveness of AD2-clustering.
49 Citations
13 Claims
-
1. A method of clustering complex data objects, comprising the steps of:
-
a) performing an initial segmentation of the data objects; b) performing a series of discrete distribution (D2) clustering operations on the data objects using a scalable method to optimize a set of Wassersrtein centroids within each segment; c) combining the centroids determined in step b) into one data set and performing a segmentation of this data set; d) iteratively repeating steps b) and c) at higher levels in a hierarchy, if necessary, until a single segmentation is achieved, the number of centroids is reduced to an acceptable level, or another stopping criterion is satisfied; and wherein the D2 clustering operations are performed by parallel processors or a single processor in sequence. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
Specification