Centroid detection for clustering
First Claim
1. A computer-implemented method for categorizing data points, comprising:
- identifying a first number of centroids indicating how many centroids are to be used in evaluating a dataset;
selecting a location for the identified first number of centroids within the dataset;
performing a clustering procedure, comprising;
repeating a second number of times;
assigning, to data points within the dataset, a cluster based at least in part on a centroid location;
determining a center point of at least one cluster of the data points; and
moving the centroid location to the center point of its respective cluster;
adjusting the first number of centroids in the dataset and repeating the clustering procedure based at least in part on the movement of at least one centroid location by a delta amount; and
identifying at least one final centroid location.
1 Assignment
0 Petitions
Accused Products
Abstract
A method of categorizing data points is described which, when combined with a clustering algorithm, provides groupings of data points that have an improved confidence interval. The method can be used to find an optimal number of groupings for a dataset, which in turn allows a user to categorize a group of data points for processing. In some examples, a dataset containing a number of data points may be accessed. Additionally, in some aspects, groupings of data points within the dataset may be grouped based at least in part on similarities between the data. Further, a number of groupings of data points may be adjusted so that the distance between the data points within one or more groupings of data points may fit within a confidence level.
17 Citations
16 Claims
-
1. A computer-implemented method for categorizing data points, comprising:
-
identifying a first number of centroids indicating how many centroids are to be used in evaluating a dataset; selecting a location for the identified first number of centroids within the dataset; performing a clustering procedure, comprising; repeating a second number of times; assigning, to data points within the dataset, a cluster based at least in part on a centroid location; determining a center point of at least one cluster of the data points; and moving the centroid location to the center point of its respective cluster; adjusting the first number of centroids in the dataset and repeating the clustering procedure based at least in part on the movement of at least one centroid location by a delta amount; and identifying at least one final centroid location. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer-implemented method of categorizing data points, comprising:
-
selecting a number of centroids; assigning, to data points, a cluster based at least in part on a location of the centroid; determining a center point of the cluster of data points; determining a difference between the location of the centroid and the center point of the cluster; adjusting the number of centroids based at least in part on the difference between the location of the centroid and the center point of the cluster; and identifying a final a centroid location based at least in part on the difference between the location of the centroid and the center point of the cluster. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16)
-
Specification