Data labelling apparatus and method thereof
First Claim
1. Data labelling apparatus comprising:
- an input device for receiving a plurality of training labelled examples, each training labelled example comprising a training set of attributes and an associated known label, and at least one unlabelled example, each unlabelled example comprising a set of attributes for which an associated label is to be identified; and
a processor for identifying one or more potential labels for each unlabelled example, wherein the processor includes a program memory in which is stored a set of instructions for performing analytically or computationally the following steps;
defining an infinite sample space with respect to label sets, each label set comprising the plurality of training labelled examples and the at least one unlabelled example, in each of the label sets each unlabelled example being associated with a different one of an infinite number of potential labels;
identifying a relationship between the label sets populating the infinite sample space and strangeness in which the individual label sets each have a calculable strangeness value; and
identifying a range of potential labels for each unlabelled example on the basis of a predetermined strangeness threshold corresponding to a maximum accepted strangeness value, the range of potential labels being members of a set of label sets having strangeness values falling within the strangeness threshold.
1 Assignment
0 Petitions
Accused Products
Abstract
The transductive confidence machine consists of data labelling apparatus which is capable of identifying for an unknown example, a range of most suitable labels from an infinite number of potential labels. The method identifies a range of possible label sets having a strangeness value below a certain pre-determined strangeness threshold, without pre-calculating the strangeness value of all of the possible label sets. The label sets each comprise training labelled examples and at least one unlabelled example, in each of the label sets each unlabelled example being associated with a different one of an infinite number of potential labels. The apparatus and method enable a mode of inference known as transductive inference, in which the labelling of every new unlabelled example is done independently. In general, no computations carried out in relation to other unlabelled examples can be re-used when a different unlabelled example is to be assigned a range of labels which are members of label sets having a strangeness value below the threshold strangeness value.
-
Citations
29 Claims
-
1. Data labelling apparatus comprising:
-
an input device for receiving a plurality of training labelled examples, each training labelled example comprising a training set of attributes and an associated known label, and at least one unlabelled example, each unlabelled example comprising a set of attributes for which an associated label is to be identified; and
a processor for identifying one or more potential labels for each unlabelled example, wherein the processor includes a program memory in which is stored a set of instructions for performing analytically or computationally the following steps;
defining an infinite sample space with respect to label sets, each label set comprising the plurality of training labelled examples and the at least one unlabelled example, in each of the label sets each unlabelled example being associated with a different one of an infinite number of potential labels;
identifying a relationship between the label sets populating the infinite sample space and strangeness in which the individual label sets each have a calculable strangeness value; and
identifying a range of potential labels for each unlabelled example on the basis of a predetermined strangeness threshold corresponding to a maximum accepted strangeness value, the range of potential labels being members of a set of label sets having strangeness values falling within the strangeness threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A data labelling method comprising the following steps that are performed analytically or computationally:
-
inputting a plurality of training labelled examples, each training labelled example comprising a training set of attributes and an associated known label, and inputting at least one labelled example, each unlabelled example comprising a set of attributes for which an associated range of labels is to be identified;
defining an infinite sample space with respect to label sets, each label set comprising the plurality of training labelled examples and the at least one unlabelled example, in each of the label sets each unlabelled example being associated with a different one of an infinite number of potential labels;
identifying a relationship between the label sets populating the infinite sample space and strangeness in which the individual label sets each have a calculable strangeness value; and
identifying a range of potential labels for each unlabelled example on the basis of a predetermined strangeness threshold corresponding to a maximum accepted strangeness value, the range of potential labels being members of a set of label sets having strangeness values falling within the strangeness threshold. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
-
Specification