Systems and methods for clustering data samples
First Claim
1. A computer-implemented method for clustering data samples, at least a portion of the method being performed by a computing device comprising at least one processor, the method comprising:
- identifying a plurality of samples to cluster;
identifying a plurality of candidate features for clustering the plurality of samples;
identifying a plurality of candidate distance functions for clustering the plurality of samples;
selecting a distance function from the plurality of candidate distance functions for clustering the plurality of samples at least in part by;
selecting a set of features from the plurality of candidate features for clustering the plurality of samples based at least in part on determining that a result of clustering a training set of samples using the set of features and the distance function fits an expected clustering of the training set of samples more closely than an additional result of clustering the training set of samples using an alternative set of features from the plurality of candidate features and the distance function, according to a predefined clustering accuracy metric;
determining that the result of clustering the training set of samples using the set of features and the distance function fits the expected clustering of the training set of samples more closely than a best result of clustering the training set of samples for each candidate distance function, aside from the distance function, within the plurality of candidate distance functions, according to the predefined clustering accuracy metric;
clustering the plurality of samples using the set of features and the distance function.
7 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented method for clustering data samples may include (1) identifying a plurality of samples, (2) identifying a plurality of candidate features, (3) identifying a plurality of candidate distance functions, (4) selecting a distance function by (i) selecting a set of features based on determining that a result of clustering a training set of samples using the set of features and the distance function fits an expected clustering of the training set of samples more closely than results from using an alternative set of features and (ii) determining that the result of clustering the training set using the set of features and the distance function fits the expected clustering of the training set of samples more closely than a best result of any other distance function, and (5) clustering the plurality of samples using the set of features and the distance function. Various other methods and systems are also disclosed.
16 Citations
20 Claims
-
1. A computer-implemented method for clustering data samples, at least a portion of the method being performed by a computing device comprising at least one processor, the method comprising:
-
identifying a plurality of samples to cluster; identifying a plurality of candidate features for clustering the plurality of samples; identifying a plurality of candidate distance functions for clustering the plurality of samples; selecting a distance function from the plurality of candidate distance functions for clustering the plurality of samples at least in part by; selecting a set of features from the plurality of candidate features for clustering the plurality of samples based at least in part on determining that a result of clustering a training set of samples using the set of features and the distance function fits an expected clustering of the training set of samples more closely than an additional result of clustering the training set of samples using an alternative set of features from the plurality of candidate features and the distance function, according to a predefined clustering accuracy metric; determining that the result of clustering the training set of samples using the set of features and the distance function fits the expected clustering of the training set of samples more closely than a best result of clustering the training set of samples for each candidate distance function, aside from the distance function, within the plurality of candidate distance functions, according to the predefined clustering accuracy metric; clustering the plurality of samples using the set of features and the distance function. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for clustering data samples, the system comprising:
-
an identification module programmed to; identify a plurality of samples to cluster; identify a plurality of candidate features for clustering the plurality of samples; identify a plurality of candidate distance functions for clustering the plurality of samples; a selection module programmed to select a distance function from the plurality of candidate distance functions for clustering the plurality of samples at least in part by; selecting a set of features from the plurality of candidate features for clustering the plurality of samples based at least in part on determining that a result of clustering a training set of samples using the set of features and the distance function fits an expected clustering of the training set of samples more closely than an additional result of clustering the training set of samples using an alternative set of features from the plurality of candidate features and the distance function, according to a predefined clustering accuracy metric; determining that the result of clustering the training set of samples using the set of features and the distance function fits the expected clustering of the training set of samples more closely than a best result of clustering the training set of samples for each candidate distance function, aside from the distance function, within the plurality of candidate distance functions, according to the predefined clustering accuracy metric; a clustering module programmed to cluster the plurality of samples using the set of features and the distance function; at least one processor configured to execute the identification module, the selection module, and the clustering module. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer-readable medium comprising one or more computer-executable instructions that, when executed by at least one processor of a computing device, cause the computing device to:
-
identify a plurality of samples to cluster; identify a plurality of candidate features for clustering the plurality of samples; identify a plurality of candidate distance functions for clustering the plurality of samples; select a distance function from the plurality of candidate distance functions for clustering the plurality of samples at least in part by; selecting a set of features from the plurality of candidate features for clustering the plurality of samples based at least in part on determining that a result of clustering a training set of samples using the set of features and the distance function fits an expected clustering of the training set of samples more closely than an additional result of clustering the training set of samples using an alternative set of features from the plurality of candidate features and the distance function, according to a predefined clustering accuracy metric; determining that the result of clustering the training set of samples using the set of features and the distance function fits the expected clustering of the training set of samples more closely than a best result of clustering the training set of samples for each candidate distance function, aside from the distance function, within the plurality of candidate distance functions, according to the predefined clustering accuracy metric; cluster the plurality of samples using the set of features and the distance function. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification