Validation of nearest neighbor classifiers
First Claim
Patent Images
1. A system comprising:
- a computer comprising a memory device and a central processing unit (CPU);
application software;
means to read a set of in-sample examples into the memory device of the computer;
means to specify a partition of the in-sample examples into a validation set and a remaining set;
means to specify a partition of the validation set into two holdout sets, a first holdout set and a second holdout set;
means to compute the frequency, over the validation set, of inputs for which the classifier based on the remaining set is incorrect;
means to compute the difference in two frequencies over the second holdout set, the first frequency being of inputs for which the classifier based on the remaining set is correct but the classifier based on both the remaining set and the first holdout set is incorrect, the second frequency being of inputs for which the classifier based on the remaining set is incorrect but the classifier based on both the remaining set and the first holdout set is correct; and
means to compute the difference in two frequencies over the first holdout set, the first frequency of inputs for which the classifier based on the remaining set is correct but the classifier based on both the remaining set and the second holdout set is incorrect, the second frequency of inputs for which the classifier based on the remaining set is incorrect but the classifier based on both the remaining set and the second holdout set is correct.
0 Assignments
0 Petitions
Accused Products
Abstract
A computer-based system computes a probabilistic bound on the error probability of a nearest neighbor classifier as follows. A subset of the examples in the classifier is used to form a reduced classifier. The error frequency of the reduced classifier on the remaining examples is computed as a baseline estimate of the error probability for the original classifier. Additionally, subsets of the examples outside the reduced classifier are combined with the reduced classifier and applied to the remaining examples in order to estimate the difference in error probability for the reduced classifier and error probability for the original classifier.
11 Citations
4 Claims
-
1. A system comprising:
-
a computer comprising a memory device and a central processing unit (CPU);
application software;
means to read a set of in-sample examples into the memory device of the computer;
means to specify a partition of the in-sample examples into a validation set and a remaining set;
means to specify a partition of the validation set into two holdout sets, a first holdout set and a second holdout set;
means to compute the frequency, over the validation set, of inputs for which the classifier based on the remaining set is incorrect;
means to compute the difference in two frequencies over the second holdout set, the first frequency being of inputs for which the classifier based on the remaining set is correct but the classifier based on both the remaining set and the first holdout set is incorrect, the second frequency being of inputs for which the classifier based on the remaining set is incorrect but the classifier based on both the remaining set and the first holdout set is correct; and
means to compute the difference in two frequencies over the first holdout set, the first frequency of inputs for which the classifier based on the remaining set is correct but the classifier based on both the remaining set and the second holdout set is incorrect, the second frequency of inputs for which the classifier based on the remaining set is incorrect but the classifier based on both the remaining set and the second holdout set is correct. - View Dependent Claims (2, 3, 4)
means to specify a partition of the remaining set into a core set and a non-core set; and
means to compute the frequency, over the non-core set, of inputs with at least one closer input in each holdout set than the closest input in the core set.
-
-
3. The system as recited in claim 1, further comprising:
-
means to read a set of unlabelled inputs into the memory device of the computer; and
means to compute the frequency, over the unlabelled inputs, of inputs with at least one closer input in each holdout set than the closest input in the remaining set.
-
-
4. The system as recited in claim 1, further comprising:
-
means to specify a partition of the remaining set into a core set and a non-core set;
means to read a set of unlabelled inputs into the memory device of the computer; and
means to compute the frequency, over the unlabelled inputs and non-core inputs, of inputs with at least one closer input in each holdout set than the closest input in the core set.
-
Specification