×

Systems and methods for clustering data

  • US 9,684,705 B1
  • Filed: 03/14/2014
  • Issued: 06/20/2017
  • Est. Priority Date: 03/14/2014
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer-implemented method for clustering data, at least a portion of the method being performed by a computing device comprising at least one processor, the method comprising:

  • identifying a plurality of samples that have been grouped into existing clusters, the existing clusters comprising a predetermined cluster radius;

    locating a sample, from within the plurality of samples, that is a centroid of an existing cluster;

    restructuring the existing cluster to generate a new cluster that includes samples with matching attributes by;

    locating another sample that is, among the plurality of samples, next closest to the centroid relative to a most recently located sample;

    determining whether an attribute of the next-closest sample matches an attribute of the centroid;

    determining whether to adjust the radius of the existing cluster based on whether the attribute of the next-closest sample matches the attribute of the centroid;

    repeating the steps of locating the next-closest sample, determining whether the attributes match, and determining whether to adjust the radius of the existing cluster until the attribute of the next-closest sample does not match the attribute of the centroid, and when the attribute of the next-closest sample does not match the attribute of the centroid, adjusting the radius of the existing cluster by setting the radius as a distance from the centroid to a most-recently located matching sample such that only samples with matching attributes are included within the new cluster; and

    restructuring the existing clusters that have not been restructured by generating, using samples within the plurality of samples that are not included in the new cluster, additional new clusters until each sample within the plurality of samples is included within at least one new cluster, wherein for each additional new cluster;

    the additional new cluster comprises a variable cluster radius instead of the predetermined cluster radius;

    the additional new cluster includes at least one sample whose attribute matches an attribute of a centroid of the additional new cluster; and

    the additional new clusters are generated by a computing system such that a single pass over the plurality of samples is needed to assign each sample to a new cluster without requiring additional computing resources that would be needed from the computing system for multiple iterations of a clustering algorithm.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×