Rapid iterative development of classifiers
First Claim
Patent Images
1. A computer-implemented method of training a classifier, such that the trained classifier is configured to map an instance to one of a plurality of classes, comprising:
- providing a query space, wherein each of a plurality of queries of the query space is associated with a subset of a plurality of instances and a subset of a plurality of features, where each of the plurality of queries of the query space includes a relevant characteristic function that describes, for that query, a function of instance-feature values associated with that query and of instance-class probabilities associated with that query, wherein the instance-class probabilities are an indication of a probabilistic model of mapping of the instances associated with the query to at least one of the plurality of classes, the query space tangibly embodied in a computer-readable medium;
for each of the plurality of queries, applying a query utility function to determine a query utility value for that query;
based, at least in part, upon the query utility value determined for each of the plurality of queries, identifying one or more queries of the plurality of queries to be presented for editorial feedback;
operating a computing device to present the identified queries and receive an indication of commentary from at least one editor for each of the identified one or more queries of the plurality of queries;
maintaining a classifier framework tangibly embodied in a computer-readable medium, the classifier framework configured to provide class probabilities for the instances according to a tunable parameter vector;
operating a computing device to determine a distortion value for the identified queries by applying a distortion function to a deviation of the classifier framework response from the indication of editorial commentary for that query; and
operating a computing device to adjust the tunable parameter vector based on a cost function that considers a regularization component and the distortion values over the queries for which the editors gave commentary.
9 Assignments
0 Petitions
Accused Products
Abstract
A classifier development process seamlessly and intelligently integrates different forms of human feedback on instances and features into the data preparation, learning and evaluation stages. A query utility based active learning approach is applicable to different types of editorial feedback. A bi-clustering based technique may be used to further speed up the active learning process.
-
Citations
23 Claims
-
1. A computer-implemented method of training a classifier, such that the trained classifier is configured to map an instance to one of a plurality of classes, comprising:
-
providing a query space, wherein each of a plurality of queries of the query space is associated with a subset of a plurality of instances and a subset of a plurality of features, where each of the plurality of queries of the query space includes a relevant characteristic function that describes, for that query, a function of instance-feature values associated with that query and of instance-class probabilities associated with that query, wherein the instance-class probabilities are an indication of a probabilistic model of mapping of the instances associated with the query to at least one of the plurality of classes, the query space tangibly embodied in a computer-readable medium; for each of the plurality of queries, applying a query utility function to determine a query utility value for that query; based, at least in part, upon the query utility value determined for each of the plurality of queries, identifying one or more queries of the plurality of queries to be presented for editorial feedback; operating a computing device to present the identified queries and receive an indication of commentary from at least one editor for each of the identified one or more queries of the plurality of queries; maintaining a classifier framework tangibly embodied in a computer-readable medium, the classifier framework configured to provide class probabilities for the instances according to a tunable parameter vector; operating a computing device to determine a distortion value for the identified queries by applying a distortion function to a deviation of the classifier framework response from the indication of editorial commentary for that query; and operating a computing device to adjust the tunable parameter vector based on a cost function that considers a regularization component and the distortion values over the queries for which the editors gave commentary. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A computer program product comprising at least one non-transitory computer readable medium having computer program instructions tangibly embodied thereon, the computer program instructions to configure at least one computing device to train a classifier such that the trained classifier is configured to map an instance to one of a plurality of classes, the computer program instructions, when executed by a processor, causing the processor to perform steps, comprising:
-
determining, from a plurality of instances that each have corresponding features of a plurality of features, clusters of instances that are homogeneous with respect to dependency of instance class assignments on the instance-feature values; constructing a reduced query space from a query space based on the determined clusters, the query space including a plurality of queries, each of the plurality of queries being associated with a subset of the plurality of instances and a subset of the plurality of features, wherein the reduced query space includes a second plurality of queries, the second plurality of queries including a subset of the plurality of queries; approximating a query utility function using the determined clusters; for each of the second plurality of queries, applying the query utility function to determine a query utility value for that query; based, at least in part, upon the query utility value determined for each of the second plurality of queries, identifying one or more queries of the second plurality of queries to be presented for editorial feedback; operate a computing device to present the identified queries and receive an indication of commentary from at least one editor for each of the identified queries; maintain a classifier framework tangibly embodied in a computer-readable medium, the classifier framework configured to provide class probabilities for the instances according to a tunable parameter vector; determine a distortion value for each of the identified queries by applying a distortion function to a deviation of the classifier framework response from the indication of editorial commentary for that query; and adjust the tunable parameter vector based on a cost function that considers a regularization component and the distortion values over the queries for which the editors gave commentary. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A computer program product comprising at least one non-transitory computer readable medium having computer program instructions tangibly embodied thereon, the computer program instructions to configure at least one computing device to train a classifier such that the trained classifier is configured to map an instance to one of a plurality of classes, the computer program instructions, when executed by a processor, causing the processor to perform steps, comprising:
-
providing a query space, wherein each of a plurality of queries of the query space pertains to a subset of a plurality of instances and a subset of a plurality of features, and wherein each of the plurality of queries of the query space includes a relevant characteristic function that describes, for that query, a function of instance-feature values associated with that query and of instance-class probabilities associated with that query, wherein the instance-class probabilities are an indication of a probabilistic model of mapping of the instances associated with the query to at least one of the plurality of classes, the query space tangibly embodied in a computer-readable medium; for each of the plurality of queries, applying a query utility function to determine a query utility value for that query; based, at least in part, upon the query utility value determined for each of the plurality of queries, identifying one or more queries of the plurality of queries to be presented for editorial feedback; operating a computing device to present the identified queries and receive an indication of commentary from at least one editor for each of the identified one or more queries of the plurality of queries; maintaining a classifier framework tangibly embodied in a computer-readable medium, the classifier framework configured to provide class probabilities for the instances according to a tunable parameter vector; operating a computing device to determine a distortion value for the identified queries by applying a distortion function to a deviation of the classifier framework response from the indication of editorial commentary for that query; and operating a computing device to adjust the tunable parameter vector based on a cost function that considers a regularization component and the distortion values over the queries for which the editors gave commentary.
-
Specification