METHODS, SYSTEMS, AND MEDIA FOR PROVIDING DIRECT AND HYBRID DATA ACQUISITION APPROACHES
First Claim
1. A method for data acquisition for construction of classification models that incorporates multiple human reviewing resources, the method comprising:
- receiving a cost structure for constructing a classification model using a data set;
instructing a plurality of human reviewing resources to search through the data set and select one or more instances of a class that satisfy at least one criterion, wherein the plurality of human reviewing resources are provided with a definition of the class;
training the classification model with the one or more instances from the plurality of human reviewing resources;
determining when an expected gain for performing additional searches by the plurality of human reviewing resources as a function of the cost structure is lower than a given threshold; and
in response to determining that the expected gain as a function of the cost structure is lower than the given threshold, instructing the plurality of human reviewing resources that was searching through the data set to label one or more examples from the data set.
7 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and media for providing direct and hybrid data acquisition approaches are provided. In accordance with some embodiments of the disclosed subject matter, a method of data acquisition for construction of classification models that incorporates multiple human reviewing resources is provided, the method comprising: receiving a cost structure for constructing a classification model using a data set; instructing a plurality of human reviewing resources to search through the data set and select one or more instances of a class that satisfy at least one criterion, wherein the plurality of human reviewing resources are provided with a definition of the class; training the classification model with the one or more instances from the plurality of human reviewing resources; determining when an expected gain for performing additional searches by the plurality of human reviewing resources as a function of the cost structure is lower than a given threshold; and, in response to determining that the expected gain as a function of the cost structure is lower than the given threshold, instructing the plurality of human reviewing resources that was searching through the data set to label one or more examples from the data set.
-
Citations
19 Claims
-
1. A method for data acquisition for construction of classification models that incorporates multiple human reviewing resources, the method comprising:
-
receiving a cost structure for constructing a classification model using a data set; instructing a plurality of human reviewing resources to search through the data set and select one or more instances of a class that satisfy at least one criterion, wherein the plurality of human reviewing resources are provided with a definition of the class; training the classification model with the one or more instances from the plurality of human reviewing resources; determining when an expected gain for performing additional searches by the plurality of human reviewing resources as a function of the cost structure is lower than a given threshold; and in response to determining that the expected gain as a function of the cost structure is lower than the given threshold, instructing the plurality of human reviewing resources that was searching through the data set to label one or more examples from the data set. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system for data acquisition for construction of classification models that incorporates multiple human reviewing resources, the system comprising:
a processor that; receives a cost structure for constructing a classification model using a data set; instructs a plurality of human reviewing resources to search through the data set and select one or more instances of a class that satisfy at least one criterion, wherein the plurality of human reviewing resources are provided with a definition of the class; trains the classification model with the one or more instances from the plurality of human reviewing resources; determines when an expected gain for performing additional searches by the plurality of human reviewing resources as a function of the cost structure is lower than a given threshold; and in response to determining that the expected gain as a function of the cost structure is lower than the given threshold, instructs the plurality of human reviewing resources that was searching through the data set to label the data set. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
19. A non-transitory computer-readable medium containing computer-executable instructions that, when executed by a processor, cause the processor to perform a method for data acquisition for construction of classification models that incorporates multiple human reviewing resources, the method comprising:
-
receiving a cost structure for constructing a classification model using a data set; instructing a plurality of human reviewing resources to search through the data set and select one or more instances of a class that satisfy at least one criterion, wherein the plurality of human reviewing resources are provided with a definition of the class; training the classification model with the one or more instances from the plurality of human reviewing resources; determining when an expected gain for performing additional searches by the plurality of human reviewing resources as a function of the cost structure is lower than a given threshold; and in response to determining that the expected gain as a function of the cost structure is lower than the given threshold, instructing the plurality of human reviewing resources that was searching through the data set to label one or more examples from the data set.
-
Specification