Computer implemented scalable, incremental and parallel clustering based on weighted divide and conquer
First Claim
1. A method for clustering a set S of n data points to find k final centers, comprising:
- partitioning said set S into P disjoint pieces S1, . . . ,SP;
for each said piece Si, determining a set Di of k intermediate centers;
assigning each data point in each piece Si to the nearest one of said k intermediate centers;
weighting each of said k intermediate centers in each set Di by the number of points in the corresponding piece Si assigned to that center; and
clustering said weighted intermediate centers together to find said k final centers, said clustering performed using a specific error metric and a clustering method A.
7 Assignments
0 Petitions
Accused Products
Abstract
A technique that uses a weighted divide and conquer approach for clustering a set S of n data points to find k final centers. The technique comprises 1) partitioning the set S into P disjoint pieces S1, . . . , Sp; 2) for each piece Si, determining a set Di of k intermediate centers; 3) assigning each data point in each piece Si to the nearest one of the k intermediate centers; 4) weighting each of the k intermediate centers in each set Di by the number of points in the corresponding piece Si assigned to that center; and 5) clustering the weighted intermediate centers together to find said k final centers, the clustering performed using a specific error metric and a clustering method A.
-
Citations
18 Claims
-
1. A method for clustering a set S of n data points to find k final centers, comprising:
-
partitioning said set S into P disjoint pieces S1, . . . ,SP;
for each said piece Si, determining a set Di of k intermediate centers;
assigning each data point in each piece Si to the nearest one of said k intermediate centers;
weighting each of said k intermediate centers in each set Di by the number of points in the corresponding piece Si assigned to that center; and
clustering said weighted intermediate centers together to find said k final centers, said clustering performed using a specific error metric and a clustering method A. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. An article comprising a computer readable medium having instructions stored thereon which when executed causes clustering a set S of n data points to find k final centers, said clustering implemented by:
-
partitioning said set S into P disjoint pieces S1, . . . ,SP;
for each said piece Si, determining a set Di of k intermediate centers;
assigning each data point in each piece Si to the nearest one of said k intermediate centers;
weighting each of said k intermediate centers in each set Di by the number of points in the corresponding piece Si, assigned to that center; and
clustering said weighted intermediate centers together to find said k final centers, said clustering performed using a specific error metric and a clustering method A. - View Dependent Claims (14, 15, 16)
-
-
17. An apparatus for clustering a set S of n data points to find k final centers, said apparatus comprising:
-
a main memory;
a processor coupled to said memory, said processor configured to partition said set S into P disjoint pieces Si, . . . , SP such that each piece Si fits in main memory, said each piece Si first stored separately in said main memory and then clustered by said processor performing;
for each said piece Si, determining a set Di of k intermediate centers;
assigning each data point in each piece Si, to the nearest one of said k intermediate centers;
weighting each of said k intermediate centers in each set Di by the number of points in the corresponding piece Si, assigned to that center; and
clustering said weighted intermediate centers together to find said k final centers, said clustering performed using a specific error metric and a clustering method A.
-
-
18. An apparatus for clustering a set S of n data points to find k final centers, said apparatus comprising:
-
a main memory;
a plurality of processors coupled to said main memory, one of said processors configured to partition said set S into P disjoint pieces S1, . . . , SP such that each piece Si, fits in main memory, said each piece Si, first stored separately in said main memory and then clustered by each said processor performing;
for each said piece Si, determining a set Di of k intermediate centers;
assigning each data point in each piece Si, to the nearest one of said k intermediate centers; and
weighting each of said k intermediate centers in each set Di by the number of points in the corresponding piece Si, assigned to that center, further wherein after said weighting, one of said processors finally clustering said weighted intermediate centers together to find said k final centers, said clustering performed using a specific error metric and a clustering method A.
-
Specification