Classifying Samples Using Clustering
First Claim
1. A computer-implemented method of classifying a sample, comprising:
- establishing a set of samples containing labeled and unlabeled samples;
gathering values of features from the labeled and unlabeled samples;
selecting a subset of the features;
clustering the labeled and unlabeled samples together based on similarity of the gathered values of the selected subset of features to produce a set of clusters, each cluster having a subset of samples from the set of samples;
recursively iterating the selecting and clustering steps on the subset of samples in each cluster in the set of clusters until at least one stopping condition is reached, the iterations producing a cluster having a labeled sample and an unlabeled sample; and
propagating a label from the labeled sample in the cluster to the unlabeled sample in the cluster to classify the unlabeled sample.
6 Assignments
0 Petitions
Accused Products
Abstract
An unlabeled sample is classified using clustering. A set of samples containing labeled and unlabeled samples is established. Values of features are gathered from the samples contained in the datasets and a subset of features are selected. The labeled and unlabeled samples are clustered together based on similarity of the gathered values for the selected subset of features to produce a set of clusters, each cluster having a subset of samples from the set of samples. The selecting and clustering steps are recursively iterated on the subset of samples in each cluster in the set of clusters until at least one stopping condition is reached. The iterations produce a cluster having a labeled sample and an unlabeled sample. A label is propagated from the labeled sample in the cluster to the unlabeled sample in the cluster to classify the unlabeled sample.
-
Citations
20 Claims
-
1. A computer-implemented method of classifying a sample, comprising:
-
establishing a set of samples containing labeled and unlabeled samples; gathering values of features from the labeled and unlabeled samples; selecting a subset of the features; clustering the labeled and unlabeled samples together based on similarity of the gathered values of the selected subset of features to produce a set of clusters, each cluster having a subset of samples from the set of samples; recursively iterating the selecting and clustering steps on the subset of samples in each cluster in the set of clusters until at least one stopping condition is reached, the iterations producing a cluster having a labeled sample and an unlabeled sample; and propagating a label from the labeled sample in the cluster to the unlabeled sample in the cluster to classify the unlabeled sample. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer for classifying a sample, comprising:
-
a non-transitory computer-readable storage medium storing computer program modules executable to perform steps comprising; establishing a set of samples containing labeled samples and unlabeled samples; gathering values of features from the labeled and unlabeled samples; selecting a subset of the features; clustering the labeled and unlabeled samples together based on similarity of the gathered values of the selected subset of features to produce a set of clusters, each cluster having a subset of samples from the set of samples; recursively iterating the selecting and clustering steps on the subset of samples in each cluster in the set of clusters until at least one stopping condition is reached, the iterations producing a cluster having a labeled sample and an unlabeled sample; and propagating a label from the labeled sample in the cluster to the unlabeled sample in the cluster to classify the unlabeled sample; and a computer processor for executing the computer program modules. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. A non-transitory computer-readable storage medium storing computer program modules for classifying a sample, the computer program modules executable to perform steps comprising:
-
establishing a set of samples containing labeled and unlabeled samples; gathering values of features from the labeled and unlabeled samples; selecting a subset of the features; clustering the labeled and unlabeled samples together based on similarity of the gathered values of the selected subset of features to produce a set of clusters, each cluster having a subset of samples from the set of samples; recursively iterating the selecting and clustering steps on the subset of samples in each cluster in the set of clusters until at least one stopping condition is reached, the iterations producing a cluster having a labeled sample and an unlabeled sample; and propagating a label from the labeled sample in the cluster to the unlabeled sample in the cluster to classify the unlabeled sample. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification