GENERATING GOLD QUESTIONS FOR CROWDSOURCING
First Claim
1. A method for generating a gold question for a labeling task comprising:
- sampling a positive class from a predefined set of classes to be used in labeling documents, based on a computed measure of class popularity;
for the positive class, identifying a set of negative classes from the set of classes based on a distance measure between the positive class and other classes in the set of classes;
generating a gold question which includes a document representative of the positive class and a set of candidate answers, the candidate answers including a label for the positive class and a label for each of the negative classes in the identified set of negative classes; and
outputting the gold question,wherein at least one of the sampling, identifying, and generating is performed with a computer processor.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for generating gold questions for labeling tasks are disclosed. The method includes sampling a positive class from a predefined set of classes to be used in labeling documents, based on a computed measure of class popularity. A set of negative classes is identified from the set of classes based on a distance measure between the positive class and other classes in the set of classes. A gold question is generated which includes a document representative of the positive class and a set of candidate answers. The candidate answers include a label for the positive class and a label for each of the negative classes in the identified set of negative classes. A task may be generated which includes the gold question and a plurality of standard questions which each include a document to be labeled. A computer processor may implement all or part of the method.
-
Citations
22 Claims
-
1. A method for generating a gold question for a labeling task comprising:
-
sampling a positive class from a predefined set of classes to be used in labeling documents, based on a computed measure of class popularity; for the positive class, identifying a set of negative classes from the set of classes based on a distance measure between the positive class and other classes in the set of classes; generating a gold question which includes a document representative of the positive class and a set of candidate answers, the candidate answers including a label for the positive class and a label for each of the negative classes in the identified set of negative classes; and outputting the gold question, wherein at least one of the sampling, identifying, and generating is performed with a computer processor. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A system for generating a gold question for a labeling task comprising:
-
a positive class selector for sampling a positive class from a predefined set of classes to be used in labeling documents, the sampling being based on a computed measure of class popularity; a negative class selector for identifying a set of negative classes from the predefined set of classes based on a distance measure between the positive class and other classes in the set of classes; a gold question generator which generates a gold question that includes a document representative of the positive class and a set of candidate answers, the candidate answers including a label for the positive class and a label for each of the negative classes in the identified set of negative classes; a task outsource component which outputs a task including the gold question; and a computer processor which implements the positive class selector, negative class selector, and gold question generator. - View Dependent Claims (20, 21)
-
-
22. A method for generating a human intelligence task comprising:
-
computing a measure of popularity for each of a set of classes to be used in labeling documents; sampling a positive class from the set of classes based on the computed measure of popularity; identifying a set of negative classes from the set of classes based on a distance measure between the positive class and other classes in the set of classes; generating a gold question which includes a document representative of the positive class and a set of candidate answers, the candidate answers including a label for the positive class and a label for each of the negative classes in the identified set of negative classes; and generating a human intelligence task comprising combining the gold question with a set of standard questions, each of the standard questions including a document to be labeled and a set of candidate answers, the candidate answers including labels for at least a subset of classes from the set of classes; and outputting the human intelligence task, wherein at least one of the computing, sampling, identifying, generating the gold question, and generating the task is performed with a computer processor.
-
Specification