Error-driven feature ideation in machine learning
First Claim
Patent Images
1. A method for textual classification, comprising:
- receiving, by a processing unit, a training set of textual data;
classifying, by the processing unit, the training set of textual data to obtain a first plurality of classifications for the training set of textual data;
determining, by the processing unit, a plurality of errors based on differences between the first plurality of classifications and a first plurality of labels having been previously assigned to the training set of textual data;
determining, by the processing unit, a set of candidate features based on the determined plurality of errors to correct at least one error of the plurality of errors;
causing, by the processing unit, a display of one or more candidate features from the determined set of candidate features for selection as an applied feature;
receiving, by the processing unit, a selection of at least one candidate feature of the displayed one or more candidate features to be an applied feature; and
retraining a classifier, using the applied feature, to re-classify the training set of textual data.
2 Assignments
0 Petitions
Accused Products
Abstract
Disclosed herein are technologies directed to a feature ideator. The feature ideator can initiate a classifier that analyzes a training set of data in a classification process. The feature ideator can generate one or more suggested features relating to errors generated during the classification process. The feature ideator can generate an output to cause the errors to be rendered in a format that provides for an interaction with a user. A user can review the summary of the errors or the individual errors and select one or more features to increase the accuracy of the classifier.
21 Citations
20 Claims
-
1. A method for textual classification, comprising:
-
receiving, by a processing unit, a training set of textual data; classifying, by the processing unit, the training set of textual data to obtain a first plurality of classifications for the training set of textual data; determining, by the processing unit, a plurality of errors based on differences between the first plurality of classifications and a first plurality of labels having been previously assigned to the training set of textual data; determining, by the processing unit, a set of candidate features based on the determined plurality of errors to correct at least one error of the plurality of errors; causing, by the processing unit, a display of one or more candidate features from the determined set of candidate features for selection as an applied feature; receiving, by the processing unit, a selection of at least one candidate feature of the displayed one or more candidate features to be an applied feature; and retraining a classifier, using the applied feature, to re-classify the training set of textual data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer comprising:
-
a processor; and a non-transitory, computer-readable storage medium in communication with the processor, the non-transitory, computer-readable storage medium comprising computer-executable instructions for textual classification that, when executed by the processor, cause the processor to; initiate a classifier of a feature ideator to obtain a first plurality of classifications by classifying a training set of textual data; initiate the classifier of the feature ideator to determine a plurality of errors in the training set of textual data based on differences between the first plurality of classifications and a first plurality of labels have been previously assigned to the training set of textual data; initiate a candidate feature generator of the feature ideator to determine a set of feature candidates based on the determined plurality of errors to correct at least one error of the plurality of errors; cause a display of one or more candidate features from the determined set of candidate features for selection as an applied feature; initiate the feature ideator to receive a selection of the displayed one or more candidate features to be an applied feature and to retrain the classifier to re-classify the training set of textual data based on the applied feature. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. A non-transitory, computer-readable storage medium having computer-executable instructions for textual classification that, when executed by a computer, cause the computer to:
-
receive a training set of textual data; classifying the training set of textual data to obtain a first plurality of classifications for the training set of textual data; determine a plurality of errors based on the differences between the first plurality of classifications and a first plurality of labels having been previously assigned to the training set of textual data; determine a plurality of candidate features based on the determined plurality of errors to correct at least one error of the plurality of errors; render a feature ideation user interface comprising; a featuring area comprising a create feature section for receiving an input to initiate a feature idealization process and an applied feature section for displaying currently applied features; a feature candidate section for displaying the candidate features; and a contrast term section for displaying contrast terms, the contrast terms comprising terms that are properly classified; and retrain a classifier to re-classify the training set of textual data based on the contrast terms. - View Dependent Claims (18, 19, 20)
-
Specification