Systems and methods for clustering data
First Claim
1. A computer-implemented method for clustering data, at least a portion of the method being performed by a computing device comprising at least one processor, the method comprising:
- identifying a plurality of samples that have been grouped into existing clusters, the existing clusters comprising a predetermined cluster radius;
locating a sample, from within the plurality of samples, that is a centroid of an existing cluster;
restructuring the existing cluster to generate a new cluster that includes samples with matching attributes by;
locating another sample that is, among the plurality of samples, next closest to the centroid relative to a most recently located sample;
determining whether an attribute of the next-closest sample matches an attribute of the centroid;
determining whether to adjust the radius of the existing cluster based on whether the attribute of the next-closest sample matches the attribute of the centroid;
repeating the steps of locating the next-closest sample, determining whether the attributes match, and determining whether to adjust the radius of the existing cluster until the attribute of the next-closest sample does not match the attribute of the centroid, and when the attribute of the next-closest sample does not match the attribute of the centroid, adjusting the radius of the existing cluster by setting the radius as a distance from the centroid to a most-recently located matching sample such that only samples with matching attributes are included within the new cluster; and
restructuring the existing clusters that have not been restructured by generating, using samples within the plurality of samples that are not included in the new cluster, additional new clusters until each sample within the plurality of samples is included within at least one new cluster, wherein for each additional new cluster;
the additional new cluster comprises a variable cluster radius instead of the predetermined cluster radius;
the additional new cluster includes at least one sample whose attribute matches an attribute of a centroid of the additional new cluster; and
the additional new clusters are generated by a computing system such that a single pass over the plurality of samples is needed to assign each sample to a new cluster without requiring additional computing resources that would be needed from the computing system for multiple iterations of a clustering algorithm.
2 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented method for clustering data may include (1) identifying a plurality of samples, (2) locating a sample, from within the plurality of samples, that is a centroid of a cluster, (3) locating another sample that is, among the plurality of samples, next closest to the centroid relative to a most-recently located sample, (4) determining whether an attribute of the next-closest sample matches an attribute of the centroid, (5) determining whether to adjust a radius of the cluster based on whether the attribute of the next-closest sample matches the attribute of the centroid, and (6) repeating the steps of locating the next-closest sample, determining whether the attributes match, and determining whether to adjust the radius of the cluster, until the attribute of the next-closest sample does not match the attribute of the centroid. Various other methods, systems, and computer-readable media are also disclosed.
29 Citations
16 Claims
-
1. A computer-implemented method for clustering data, at least a portion of the method being performed by a computing device comprising at least one processor, the method comprising:
-
identifying a plurality of samples that have been grouped into existing clusters, the existing clusters comprising a predetermined cluster radius;
locating a sample, from within the plurality of samples, that is a centroid of an existing cluster;restructuring the existing cluster to generate a new cluster that includes samples with matching attributes by; locating another sample that is, among the plurality of samples, next closest to the centroid relative to a most recently located sample; determining whether an attribute of the next-closest sample matches an attribute of the centroid; determining whether to adjust the radius of the existing cluster based on whether the attribute of the next-closest sample matches the attribute of the centroid; repeating the steps of locating the next-closest sample, determining whether the attributes match, and determining whether to adjust the radius of the existing cluster until the attribute of the next-closest sample does not match the attribute of the centroid, and when the attribute of the next-closest sample does not match the attribute of the centroid, adjusting the radius of the existing cluster by setting the radius as a distance from the centroid to a most-recently located matching sample such that only samples with matching attributes are included within the new cluster; and restructuring the existing clusters that have not been restructured by generating, using samples within the plurality of samples that are not included in the new cluster, additional new clusters until each sample within the plurality of samples is included within at least one new cluster, wherein for each additional new cluster; the additional new cluster comprises a variable cluster radius instead of the predetermined cluster radius; the additional new cluster includes at least one sample whose attribute matches an attribute of a centroid of the additional new cluster; and the additional new clusters are generated by a computing system such that a single pass over the plurality of samples is needed to assign each sample to a new cluster without requiring additional computing resources that would be needed from the computing system for multiple iterations of a clustering algorithm. - View Dependent Claims (2, 3, 4, 5, 6, 15, 16)
-
-
7. A system for clustering data samples, the system comprising:
-
an identification module that identifies a plurality of samples that have been grouped into existing clusters, the existing clusters comprising a predetermined cluster radius; a location module that; locates a sample, from within the plurality of samples, that is a centroid of an existing cluster; locates another sample that is, among the plurality of samples, next closest to the centroid relative to a most recently located sample; a determination module that; determines whether an attribute of the next-closest sample matches an attribute of the centroid; determines whether to adjust the radius of the existing cluster based on whether the attribute of the next-closest sample matches the attribute of the centroid; a repetition module that; restructures the existing cluster to generate a new cluster that includes samples with matching attributes by repeating the steps of locating the next-closest sample, determining whether the attributes match, and determining whether to adjust the radius of the existing cluster until the attribute of the next-closest sample does not match the attribute of the centroid, and when the attribute of the next-closest sample does not match the attribute of the centroid, adjusts the radius of the existing cluster by setting the radius as a distance from the centroid to a most-recently located matching sample such that only samples with matching attributes are included within the new cluster; restructures the existing clusters that have not been restructured by generating, using samples within the plurality of samples that are not included in the new cluster, additional new clusters until each sample within the plurality of samples is included within at least one new cluster, wherein for each additional new cluster; the additional new cluster comprises a variable cluster radius instead of the predetermined cluster radius; the additional new cluster includes at least one sample whose attribute matches an attribute of a centroid of the additional new cluster; and the additional new clusters are generated by a computing system such that a single pass over the plurality of samples is needed to assign each sample to a new cluster without requiring additional computing resources that would be needed from the computing system for multiple iterations of a clustering algorithm; and at least one processor that executes the identification module, the determination module and the repetition module. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A non-transitory computer-readable medium comprising one or more computer-executable instructions that, when executed by at least one processor of a computing device, cause the computing device to:
-
identify a plurality of samples that have been grouped into existing clusters, the existing clusters comprising a predetermined cluster radius; locate a sample, from within the plurality of samples, that is a centroid of an existing cluster; restructure the existing cluster to generate a new cluster that includes samples with matching attributes by; locating another sample that is, among the plurality of samples, next closest to the centroid relative to a most recently located sample; determining whether an attribute of the next-closest sample matches an attribute of the centroid; determining whether to adjust the radius of the existing cluster based on whether the attribute of the next-closest sample matches the attribute of the centroid; repeating the steps of locating the next-closest sample, determining whether the attributes match, and determining whether to adjust the radius of the existing cluster until the attribute of the next-closest sample does not match the attribute of the centroid, and when the attribute of the next-closest sample does not match the attribute of the centroid, adjusting the radius of the existing cluster by setting the radius as a distance from the centroid to a most-recently located matching sample such that only samples with matching attributes are included within the new cluster; and restructure the existing clusters that have not been restructured by generating, using samples within the plurality of samples that are not included in the new cluster, additional new clusters until each sample within the plurality of samples is included within at least one new cluster, wherein for each additional new cluster; the additional new cluster comprises a variable cluster radius instead of the predetermined cluster radius; the additional new cluster includes at least one sample whose attribute matches an attribute of a centroid of the additional new cluster; and the additional new clusters are generated by a computing system such that a single pass over the plurality of samples is needed to assign each sample to a new cluster without requiring additional computing resources that would be needed from the computing system for multiple iterations of a clustering algorithm. - View Dependent Claims (14)
-
Specification