Data labelling apparatus and method thereof

US 20030236578A1
Filed: 06/25/2002
Published: 12/25/2003
Est. Priority Date: 07/20/2000
Status: Abandoned Application

First Claim

Patent Images

1. Data labelling apparatus comprising:

an input device for receiving a plurality of training labelled examples, each training labelled example comprising a training set of attributes and an associated known label, and at least one unlabelled example, each unlabelled example comprising a set of attributes for which an associated label is to be identified; and

a processor for identifying one or more potential labels for each unlabelled example, wherein the processor includes a program memory in which is stored a set of instructions for performing analytically or computationally the following steps;

defining an infinite sample space with respect to label sets, each label set comprising the plurality of training labelled examples and the at least one unlabelled example, in each of the label sets each unlabelled example being associated with a different one of an infinite number of potential labels;

identifying a relationship between the label sets populating the infinite sample space and strangeness in which the individual label sets each have a calculable strangeness value; and

identifying a range of potential labels for each unlabelled example on the basis of a predetermined strangeness threshold corresponding to a maximum accepted strangeness value, the range of potential labels being members of a set of label sets having strangeness values falling within the strangeness threshold.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The transductive confidence machine consists of data labelling apparatus which is capable of identifying for an unknown example, a range of most suitable labels from an infinite number of potential labels. The method identifies a range of possible label sets having a strangeness value below a certain pre-determined strangeness threshold, without pre-calculating the strangeness value of all of the possible label sets. The label sets each comprise training labelled examples and at least one unlabelled example, in each of the label sets each unlabelled example being associated with a different one of an infinite number of potential labels. The apparatus and method enable a mode of inference known as transductive inference, in which the labelling of every new unlabelled example is done independently. In general, no computations carried out in relation to other unlabelled examples can be re-used when a different unlabelled example is to be assigned a range of labels which are members of label sets having a strangeness value below the threshold strangeness value.

Citations

29 Claims

1. Data labelling apparatus comprising:
- an input device for receiving a plurality of training labelled examples, each training labelled example comprising a training set of attributes and an associated known label, and at least one unlabelled example, each unlabelled example comprising a set of attributes for which an associated label is to be identified; and
  
  a processor for identifying one or more potential labels for each unlabelled example, wherein the processor includes a program memory in which is stored a set of instructions for performing analytically or computationally the following steps;
  
  defining an infinite sample space with respect to label sets, each label set comprising the plurality of training labelled examples and the at least one unlabelled example, in each of the label sets each unlabelled example being associated with a different one of an infinite number of potential labels;
  
  identifying a relationship between the label sets populating the infinite sample space and strangeness in which the individual label sets each have a calculable strangeness value; and
  
  identifying a range of potential labels for each unlabelled example on the basis of a predetermined strangeness threshold corresponding to a maximum accepted strangeness value, the range of potential labels being members of a set of label sets having strangeness values falling within the strangeness threshold.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. Data labelling apparatus as claimed in claim 1, wherein the program memory stores an optimisation algorithm for identifying the relationship between the label sets populating the infinite sample space, and strangeness.
  - 3. Data labelling apparatus as claimed in claim 1, further comprising a data memory for storing the labelled and unlabelled examples.
  - 4. Data labelling apparatus as claimed in claim 1, wherein the set of instructions in the program memory identifies a range of label sets, and the relationship is used to calculate boundary values of potential labels of that range of label sets.
  - 5. Data labelling apparatus as claimed in claim 1, further comprising an output terminal for outputting information concerning the one or more predicted labels for the at least one unlabelled example.
  - 6. Data labelling apparatus as claimed in claim 5, wherein the output terminal outputs a range of predicted labels for the at least one unlabelled example.
  - 7. Data labelling apparatus as claimed in claim 2, wherein the optimisation algorithm stored in the program memory is the Ridge Regression algorithm.
  - 8. Data labelling apparatus as claimed in claim 2, wherein the optimisation algorithm stored in the program memory is a Nearest Neighbours algorithm.
  - 9. Data labelling apparatus as claimed in claim 2, wherein the optimisation algorithm stored in the program memory is the Aggregating algorithm.
  - 10. Data labelling apparatus as claimed in claim 2, wherein the optimisation algorithm stored in the program memory is the Support Vector Machine.
  - 11. Data labelling apparatus as claimed in claim 2, wherein the optimisation algorithm stored in the program memory is a neural network.
  - 12. Data labelling apparatus as claimed in claim 1, wherein the input device includes means for inputting a chosen strangeness threshold.
  - 13. Data labelling apparatus as claimed in claim 1, wherein the program memory includes a set of instructions for outputting a graphical representation of the relationship of strangeness values with respect to potential labels.
  - 14. Data labelling apparatus as claimed in claim 2, wherein the program memory includes a set of instructions for transforming the optimisation algorithm using Lagrange multipliers.
  - 15. Data labelling apparatus as claimed in claim 2, wherein the program memory includes a set of instructions for applying the optimisation algorithm to images of the attribute vectors in a Hilbert Space.

16. A data labelling method comprising the following steps that are performed analytically or computationally:
- inputting a plurality of training labelled examples, each training labelled example comprising a training set of attributes and an associated known label, and inputting at least one labelled example, each unlabelled example comprising a set of attributes for which an associated range of labels is to be identified;
  
  defining an infinite sample space with respect to label sets, each label set comprising the plurality of training labelled examples and the at least one unlabelled example, in each of the label sets each unlabelled example being associated with a different one of an infinite number of potential labels;
  
  identifying a relationship between the label sets populating the infinite sample space and strangeness in which the individual label sets each have a calculable strangeness value; and
  
  identifying a range of potential labels for each unlabelled example on the basis of a predetermined strangeness threshold corresponding to a maximum accepted strangeness value, the range of potential labels being members of a set of label sets having strangeness values falling within the strangeness threshold.
- View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
- - 17. A data labelling method as claimed in claim 16, wherein an optimisation algorithm stored in the program memory identifies the relationship between the label sets populating the infinite sample space, and strangeness.
  - 18. A data labelling method as claimed in claim 17, wherein the optimisation algorithm stored in the program memory is the Ridge Regression algorithm.
  - 19. A data labelling method as claimed in claim 17, wherein the optimisation algorithm stored in the program memory is a Nearest Neighbours algorithm.
  - 20. A data labelling method as claimed in claim 17, wherein the optimisation algorithm stored in the program memory is the Aggregating Algorithm.
  - 21. A data labelling method as claimed in claim 17, wherein the optimisation algorithm stored in the program memory is the Support Vector Machine.
  - 22. A data labelling method as claimed in claim 17, wherein the optimisation algorithm stored in the program memory is a neural network.
  - 23. A data labelling method as claimed in claim 16, wherein the set of instructions in the program memory identifies a range of label sets, and the relationship is used to calculate boundary values of potential labels of that range of label sets.
  - 24. A data labelling method as claimed in claim 16, further comprising outputting information concerning the one or more predicted labels for the at least one unlabelled example.
  - 25. A data labelling method as claimed in claim 24, further comprising outputting a range of predicted labels for the at least one unlabelled example.
  - 26. A data labelling method as claimed in claim 16, further comprising inputting a chosen strangeness threshold.
  - 27. A data labelling method as claimed in claim 16, further comprising plotting the relationship between strangeness values and potential labels.
  - 28. A data labelling method as claimed in claim 17, wherein the optimisation algorithm is transformed using Lagrange multipliers.
  - 29. A data labelling method as claimed in claim 17, wherein the optimisation algorithm is applied to images of the attribute vectors in a Hilbert space.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Royal Holloway University of London (University of London)
Original Assignee
Royal Holloway University of London (University of London)
Inventors
Vovk, Volodya, Gammerman, Alex

Application Number

US10/179,649
Publication Number

US 20030236578A1
Time in Patent Office

Days
Field of Search
US Class Current

700/47
CPC Class Codes

G06F 18/21 Design or setup of recognit...

Data labelling apparatus and method thereof

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

29 Claims

Specification

Solutions

Use Cases

Quick Links

Data labelling apparatus and method thereof

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

29 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links