Centroid detection for clustering
First Claim
1. A computer-implemented method, comprising:
- accessing a dataset containing a number of data points;
forming, based at least in part on similarities between the data points, a first number of clusters of the data points, a cluster of the clusters containing a portion of the data points;
determining a stability of the cluster based at least in part on a change to a center of the cluster between iterations of clustering the data points, the center determined based at least in part on the portion of the data points;
determining that the cluster is unstable based at least in part on the change to the center of the cluster falling outside a range, the range being based at least in part on a confidence level; and
forming, based at least in part on the cluster being unstable, a second number of modified clusters of the data points.
1 Assignment
0 Petitions
Accused Products
Abstract
A method of categorizing data points is described which, when combined with a clustering algorithm, provides groupings of data points that have an improved confidence interval. The method can be used to find an optimal number of groupings for a dataset, which in turn allows a user to categorize a group of data points for processing. In some examples, a dataset containing a number of data points may be accessed. Additionally, in some aspects, groupings of data points within the dataset may be grouped based at least in part on similarities between the data. Further, a number of groupings of data points may be adjusted so that the distance between the data points within one or more groupings of data points may fit within a confidence level.
4 Citations
20 Claims
-
1. A computer-implemented method, comprising:
-
accessing a dataset containing a number of data points; forming, based at least in part on similarities between the data points, a first number of clusters of the data points, a cluster of the clusters containing a portion of the data points; determining a stability of the cluster based at least in part on a change to a center of the cluster between iterations of clustering the data points, the center determined based at least in part on the portion of the data points; determining that the cluster is unstable based at least in part on the change to the center of the cluster falling outside a range, the range being based at least in part on a confidence level; and forming, based at least in part on the cluster being unstable, a second number of modified clusters of the data points. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system, comprising:
-
at least one memory that stores computer-executable instructions; and at least one processor configured to access the at least one memory, wherein the at least one processor is configured to execute the computer-executable instructions to collectively at least; generate a number of clusters of data points based at least in part on similarities between the data points; determine, based at least in part on respective data points clustered in a cluster of the clusters, a change to the cluster between iterations of generating the clusters, the change being to one or more of;
a center of the cluster or a composition of the cluster;determine that the cluster is unstable based at least in part on a determination that the change to the cluster falls outside a range, the range being based at least in part on a confidence level; and generate an adjusted number of clusters of the data points based at least in part on the cluster being unstable. - View Dependent Claims (12, 13, 14, 15, 16, 17)
-
-
18. One or more computer-readable media storing computer-executable instructions that, when executed by one or more computer systems, configure the one or more computer systems to perform operations comprising:
-
generating, in an iteration, a number of clusters based at least in part on a dataset; determining whether respective stabilities of the clusters are above a confidence level, a cluster of the clusters determined to be unstable based at least in part on a change to a composition of the cluster between iterations of generating the clusters; determining an adjustment to the number of clusters based at least in part one or more of the respective stabilities being below the confidence level; generating new clusters based at least in part on the dataset and the adjustment to the number of clusters; and storing the dataset within a computer storage device based at least in part on the new clusters. - View Dependent Claims (19, 20)
-
Specification