Method and apparatus for analysis and decomposition of classifier data anomalies
First Claim
1. A human assisted method, implemented with a computing device, of debugging training data used to train a machine learning classifier, the method comprising:
- obtaining a machine learning classifier training data set, wherein the machine learning classifier training data set comprises data instances mapped to predictions for those data instances, wherein each of the data instances comprises a data triplet containing a prediction ID, a weight, and an input description;
evaluating potential errors in the data set according to one or more prediction-centric metrics;
displaying one or more of the potential errors in the data set along with one or more of the metrics in a format that is user-configurable with reference to one or more of the metrics; and
debugging the machine learning classifier training data set using an integrated debugging tool configured to implement a debugging loop, including removing one or more of the potential errors in the data set, to obtain a debugged machine learning classifier data set for use in training a machine learning classifier.
2 Assignments
0 Petitions
Accused Products
Abstract
A human assisted method of debugging training data used to train a machine learning classifier is provided. The method includes obtaining a classifier training data set. The training data set is then debugged using an integrated debugging tool configured to implement a debugging loop to obtain a debugged data set. The debugging tool can be configured to perform an estimation and simplification step to reduce data noise in the training data set prior to further analysis. The debugging tool also runs a panel of prediction-centric diagnostic metrics on the training data set, and provides the user prediction based listings of the results of the panel of prediction-centric diagnostic metrics.
-
Citations
31 Claims
-
1. A human assisted method, implemented with a computing device, of debugging training data used to train a machine learning classifier, the method comprising:
-
obtaining a machine learning classifier training data set, wherein the machine learning classifier training data set comprises data instances mapped to predictions for those data instances, wherein each of the data instances comprises a data triplet containing a prediction ID, a weight, and an input description; evaluating potential errors in the data set according to one or more prediction-centric metrics; displaying one or more of the potential errors in the data set along with one or more of the metrics in a format that is user-configurable with reference to one or more of the metrics; and debugging the machine learning classifier training data set using an integrated debugging tool configured to implement a debugging loop, including removing one or more of the potential errors in the data set, to obtain a debugged machine learning classifier data set for use in training a machine learning classifier. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A human assisted method, implemented by a computing device, of debugging training data used to train a machine learning classifier, the method comprising:
-
obtaining a machine learning classifier training data set, wherein the machine learning classifier training data set comprises data instances mapped to predictions for those data instances; debugging the machine learning classifier training data set using a computer-implemented integrated debugging tool configured to implement a debugging loop to obtain a debugged machine learning classifier data set for use in training a machine learning classifier, wherein debugging the machine learning classifier training data set using the integrated debugging tool further comprises; running a panel of prediction-centric diagnostic metrics on the machine learning classifier training data;
set andproviding to a user prediction-based listings of the results of the panel of prediction-centric diagnosis metrics; wherein providing to the user the prediction-based listings of the results of the panel of prediction-centric diagnostic metrics further comprises providing user-configurable prediction based listings of the results; wherein providing the user-configurable prediction based listings of the results further comprises generating a graphical user interface which displays the prediction based listings of the results, and which is configured to receive user inputs and in response to configure the prediction based listings of the results; wherein generating the graphical user interface further comprises highlighting failed queries to associate the failed queries with failure causes; and wherein the graphical user interface is configured to receive a user input corresponding to a prediction cluster, and in response to zoom into the prediction cluster to display individual predictions included in the prediction cluster.
-
-
18. A classifier analyzer, executed on a computing device, which provides human assisted debugging of machine learning classifier training data used to train a machine learning classifier, the classifier analyzer being configured to implement steps comprising:
-
obtaining a machine learning classifier training data set, wherein the machine learning classifier training data set comprises data instances mapped to predictions for those data instances, wherein each of the data instances comprises a data triplet containing a prediction ID, a weight and an input description; evaluating potential errors in the data set according to one or more prediction-centric metrics; displaying one or more of the potential errors in the data set along with one or more of the metrics in a format that is user-configurable with reference to one or more of the metrics; and debugging the machine learning classifier training data set using a debugging loop, including removing one or more of the potential errors in the data set to obtain a debugged machine learning classifier data set for use in training a machine learning classifier. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
-
Specification