Data classification apparatus and method thereof
First Claim
Patent Images
1. A data classification apparatus comprising:
- an input device for receiving a plurality of training classified examples and at least one unclassified example;
a memory for storing said classified and unclassified examples;
an output terminal for outputting a predicted classification for said at least one unclassified example; and
a processor for identifying the predicted classification of said at least one unclassified examplewherein the processor includes;
classification allocation means for allocating potential classifications to each said unclassified example and for generating a plurality of classification sets, each said classification set containing said plurality (l) of training classified examples with their classification and said at least one unclassified example (l+1) with its said allocated potential classification;
assay means including an example valuation device which determines individual strangeness values (α
i) for each said training classified example (i=1,2 . . . l) and said at least one unclassified example (i=l+1) having an allocated potential classification (y), the assay means determining a single strangeness value (d(y)) valid under the independently and identically distributed assumption for each said classification set in dependence on said individual strangeness values (α
i) of each example by the formula a comparative device for selecting the classification set to which the most likely allocated potential classification for said at least one unclassified example belongs, wherein said predicted classification output by the output terminal is said most likely allocated classification according to said single strangeness values assigned by said assay means; and
a strength of prediction monitoring device for determining a confidence value for said predicted classification on the basis of said single strangeness value assigned by said assay means to one of said classification sets to which the second most likely allocated potential classification of said at least one unclassified example belongs.
2 Assignments
0 Petitions
Accused Products
Abstract
The data classification apparatus and method is adapted to high-dimensional classification problems and provide a universal measure of confidence that is valid under the iid assumption. The method employs the assignment of strangeness values to classification sets constructed using classification training examples and an unclassified example. The strangeness values of p-values are compared to identify the classification set containing the most likely potential classification for the unclassified example. The measure of confidence is then computed on the basis of the strangeness value of the classification set containing the second most likely potential classification.
12 Citations
6 Claims
-
1. A data classification apparatus comprising:
-
an input device for receiving a plurality of training classified examples and at least one unclassified example; a memory for storing said classified and unclassified examples; an output terminal for outputting a predicted classification for said at least one unclassified example; and a processor for identifying the predicted classification of said at least one unclassified example wherein the processor includes; classification allocation means for allocating potential classifications to each said unclassified example and for generating a plurality of classification sets, each said classification set containing said plurality (l) of training classified examples with their classification and said at least one unclassified example (l+1) with its said allocated potential classification; assay means including an example valuation device which determines individual strangeness values (α
i) for each said training classified example (i=1,2 . . . l) and said at least one unclassified example (i=l+1) having an allocated potential classification (y), the assay means determining a single strangeness value (d(y)) valid under the independently and identically distributed assumption for each said classification set in dependence on said individual strangeness values (α
i) of each example by the formulaa comparative device for selecting the classification set to which the most likely allocated potential classification for said at least one unclassified example belongs, wherein said predicted classification output by the output terminal is said most likely allocated classification according to said single strangeness values assigned by said assay means; and a strength of prediction monitoring device for determining a confidence value for said predicted classification on the basis of said single strangeness value assigned by said assay means to one of said classification sets to which the second most likely allocated potential classification of said at least one unclassified example belongs. - View Dependent Claims (2)
-
-
3. A data classification apparatus comprising:
-
an input device for receiving a plurality of training classified examples and at least one unclassified example; a memory for storing said classified and unclassified examples; stored programs including an example classification program; an output terminal for outputting a predicted classification for said at least one unclassified example; and a processor controlled by said stored programs for identifying the predicted classification of said at least one unclassified example, wherein said processor includes; classification allocation means for allocating potential classifications to each said unclassified example and for generating a plurality of classification sets, each said classification set containing said plurality (l) of training classified examples with their classification and said at least one unclassified example (l+1) with its allocated potential classification; assay means including an example valuation device which determines individual strangeness values (α
l) for each said training classified example (i=1,2 . . . l) and said at least one unclassified example (i=l+1) having an allocated potential classification (y), the assay means determining a single strangeness value (d(y)) valid under the independently and identically distributed assumption for each said classification set in dependence on said individual strangeness values (α
i) of each example by the formulaa comparative device for selecting the classification set to which the most likely allocated potential classification for said at least one unclassified example belongs, wherein the predicted classification output by said output terminal is the most likely allocated potential classification according to said single strangeness values assigned by said assay means and a strength of prediction monitoring device for determining a confidence value for said predicted classification on the basis of said single strangeness value assigned by said assay means to one of said classification sets to which the second most likely allocated potential classification of said at least one unclassified example belongs.
-
-
4. A computer-implemented data classification method comprising:
-
inputting a plurality of training classified examples and at least one unclassified example; identifying a predicted classification of said at least one unclassified example which includes, allocating potential classifications to each said unclassified example; generating a plurality (l) of classification sets, each said classification set containing said plurality of training classified examples with their classification and said at least one unclassified example (l+1) with its allocated potential classification; determining an individual strangeness value (α
i) for each said training classified example (i=1,2 . . . l) and said at least one unclassified example (i=l+1) having an allocated potential classification (y), and a single strangeness value (d(y)) valid under the independently and identically distributed assumption for each said classification set in dependence on the individual strangeness values (α
i) of each example by the formulaselecting the said classification set to which the most likely allocated potential classification for said at least one unclassified example belongs, wherein said predicted classification is the most likely allocated potential classification in dependence on said single strangeness values; determining a confidence value for said predicted classification on the basis of the single strangeness value assigned to one of said classification sets to which the second most likely allocated potential classification for said at least one unclassified example belongs; and outputting said predicted classification for said at least one unclassified example and said confidence value for said predicted classification. - View Dependent Claims (5)
-
-
6. A classification program stored on a computer readable medium for classifying data by performing the following steps:
-
generating a plurality of classification sets, each said classification set containing a plurality (l) of training classified examples with their classification and at least one unclassified example (l+1) that has been allocated a potential classification; determining an individual strangeness value (α
i) for each said training classified example (i=1,2 . . . l) and said at least one unclassified example (i=l+1) having an allocated potential classification (y), and a single strangeness value (d(y)) valid under the independently and identically distributed assumption for each said classification set in dependence on said individual strangeness values (α
l) of each example by the formulaselecting the classification set to which the most likely allocated potential classification for the said at least one unclassified example belongs, wherein the predicted classification is the most likely allocated potential classification in dependence on said single strangeness values; and determining a confidence value for said predicted classification on the basis of said single strangeness value assigned to one of said classification sets to which the second most likely allocated potential classification for said at least one unclassified example belongs.
-
Specification