EXPLORING DATA USING MULTIPLE MACHINE-LEARNING MODELS
First Claim
1. A method for exploring data in a dataset, comprising:
- inputting the dataset containing labeled data;
generating multiple models from the dataset so that each of the multiple models is based on a different configuration of the dataset; and
simultaneously running different classifier training and evaluation experiments using multiple models to explore and understand the data based on results of running different classifier training and evaluation experiments.
2 Assignments
0 Petitions
Accused Products
Abstract
A multiple model data exploration system and method for running multiple machine-learning models simultaneously to understand and explore data. Embodiments of the system and method allow a user to gain a greater understanding of the data and to gain new insights into their data. Embodiments of the system and method also allow a user to interactively explore the problem and to navigate different views of data. Many different classifier training and evaluation experiments are run simultaneously and results are obtained. The results are aggregated and visualized across each of the experiments to determine and understand how each example is classified for each different classifier. These results then are summarized in a variety of ways to allow users to obtain a greater understanding of the data both in terms of the individual examples themselves and features associated with the data.
-
Citations
20 Claims
-
1. A method for exploring data in a dataset, comprising:
-
inputting the dataset containing labeled data; generating multiple models from the dataset so that each of the multiple models is based on a different configuration of the dataset; and simultaneously running different classifier training and evaluation experiments using multiple models to explore and understand the data based on results of running different classifier training and evaluation experiments. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method for aggregating data into a plurality of visualizations to allow a user to gain insight about the data, comprising:
-
generating multiple models from a dataset containing labeled data; generating a set of trials from an initial set of tuples corresponding to labels for the label data; executing each trial in the set of trials; generating a new set of tuples from the executed trials corresponding to predicted labels; aggregating sets of predicted labels to obtain aggregated predicted labels; computing summary statistics for the aggregated predicted labels; visualizing the summary statistics using incorrectness versus entropy graphs; and examining the visualized summary statistics to understand and explore trends in the data. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A method for exploring visualizations of labeled data to formulate an optimal model for a machine learning problem, comprising:
-
inputting an initial set of tuples from the labeled data such that the tuples correspond to labels for the labeled data; generating a set of trials from the initial set of tuples; simultaneously executing each trial in the set of trials to run different classifier training and evaluation experiments using the multiple models; generating a new set of tuples from the executed trials that correspond to predicted labels such that a different set of predicted labels is generated for each of the multiple models; aggregating each set of predicted labels to obtain an aggregated set of predicted labels; computing summary statistics for each set of predicted labels and actual labels for the labeled data; visualizing the summary statistics using incorrectness versus entropy graphs; and allowing a user to interact with the incorrectness versus entropy graphs to formulate the optimal model for the machine learning problem. - View Dependent Claims (17, 18, 19, 20)
-
Specification