Accelerated discrete distribution clustering under wasserstein distance
First Claim
1. A method of clustering data objects representative of images, videos, biological processes, genetic sequences, or documents using a scalable optimization technique for scaling computational cost of the clustering to a reduced level, comprising the steps of:
- a) providing, at a computer processor including parallel processors, a set of data objects, each object being representative of an image, a video, a biological process, genetic sequence, or a document;
b) performing an initial data segmentation on the set of data objects to divide the data into trunks before performing a discrete distribution (D2) clustering operation, wherein one of the parallel processors performs the initial data segmentation step and distributes the data trunks to each of the parallel processors;
c) performing, by all the parallel processors, a discrete distribution (D2) clustering operation, including;
i) optimizing Wasserstein centroids by using a scalable and parallel optimization technique, each data object being handled on one of the parallel processors that contains the data object, local aggregated results based on each data trunk being communicated to all of the parallel processors that uses the local aggregated results to compute the Wasserstein centroids for all the data objects,ii) assigning each data object to the nearest Wasserstein centroid, the nearest Wasserstein centroid being a label of the data object, andiii) iterating i) and ii) until a single segmentation is achieved, the number of Wasserstein centroids is reduced to a predefined number, a predefined threshold based on the distances of the data objects to the Wasserstein centroids is satisfied, the number of changed labels of objects is less than a predefined number, or the number of iterations reaches a predefined number;
wherein the scalable optimization technique is a Bregman alternating direction method of multiplier (B-ADMM) or a ADMM method or a subgradient descent method, such that the computational cost of the clustering is reduced to a level which can be handled by each of the parallel processors,the sub gradient descent method being a method that optimizes for Wasserstein centroids using subgradients in optimal transport problems solved via linear programming,the ADMM method optimizing for Wasserstein centroids using updates of ADMM by decoupling one set of constraint in optimal transport into a distributed formulation, such that a number of submodules are solved from quadratic programming,the B-ADMM approach optimizing for Wasserstein centroids using updates of B-ADMM, which is a variant of ADMM, by decoupling two sets of constraints in an optimal transport into a distributed formulation, such that a number of submodules are solved in a closed formula; and
d) outputting information regarding a way in which the objects are clustered in terms of similarity.
1 Assignment
0 Petitions
Accused Products
Abstract
Computationally efficient accelerated D2-clustering algorithms are disclosed for clustering discrete distributions under the Wasserstein distance with improved scalability. Three first-order methods include subgradient descent method with re-parametrization, alternating direction method of multipliers (ADMM), and a modified version of Bregman ADMM. The effects of the hyper-parameters on robustness, convergence, and speed of optimization are thoroughly examined. A parallel algorithm for the modified Bregman ADMM method is tested in a multi-core environment with adequate scaling efficiency subject to hundreds of CPUs, demonstrating the effectiveness of AD2-clustering.
-
Citations
5 Claims
-
1. A method of clustering data objects representative of images, videos, biological processes, genetic sequences, or documents using a scalable optimization technique for scaling computational cost of the clustering to a reduced level, comprising the steps of:
-
a) providing, at a computer processor including parallel processors, a set of data objects, each object being representative of an image, a video, a biological process, genetic sequence, or a document; b) performing an initial data segmentation on the set of data objects to divide the data into trunks before performing a discrete distribution (D2) clustering operation, wherein one of the parallel processors performs the initial data segmentation step and distributes the data trunks to each of the parallel processors; c) performing, by all the parallel processors, a discrete distribution (D2) clustering operation, including; i) optimizing Wasserstein centroids by using a scalable and parallel optimization technique, each data object being handled on one of the parallel processors that contains the data object, local aggregated results based on each data trunk being communicated to all of the parallel processors that uses the local aggregated results to compute the Wasserstein centroids for all the data objects, ii) assigning each data object to the nearest Wasserstein centroid, the nearest Wasserstein centroid being a label of the data object, and iii) iterating i) and ii) until a single segmentation is achieved, the number of Wasserstein centroids is reduced to a predefined number, a predefined threshold based on the distances of the data objects to the Wasserstein centroids is satisfied, the number of changed labels of objects is less than a predefined number, or the number of iterations reaches a predefined number; wherein the scalable optimization technique is a Bregman alternating direction method of multiplier (B-ADMM) or a ADMM method or a subgradient descent method, such that the computational cost of the clustering is reduced to a level which can be handled by each of the parallel processors, the sub gradient descent method being a method that optimizes for Wasserstein centroids using subgradients in optimal transport problems solved via linear programming, the ADMM method optimizing for Wasserstein centroids using updates of ADMM by decoupling one set of constraint in optimal transport into a distributed formulation, such that a number of submodules are solved from quadratic programming, the B-ADMM approach optimizing for Wasserstein centroids using updates of B-ADMM, which is a variant of ADMM, by decoupling two sets of constraints in an optimal transport into a distributed formulation, such that a number of submodules are solved in a closed formula; and d) outputting information regarding a way in which the objects are clustered in terms of similarity. - View Dependent Claims (2, 3, 4, 5)
-
Specification