Data analytics lifecycle processes
First Claim
Patent Images
1. A method comprising:
- defining a data analytic plan for analyzing a given data set associated with a given data problem associated with a data analytics lifecycle;
obtaining a test data set and a training data set from the given data set associated with the given data problem;
executing at least one model to confirm an adequacy of the at least one model for the data analytic plan by fitting the at least one model on the training data set and evaluating the at least one model fitted on the training data set against the test data set, wherein the evaluation comprises assessing a validity of the at least one model and a validity of results of the execution of the at least one model on the test data set;
refining the at least one model based on the assessment;
conditioning at least a portion of raw data in the given data set to generate conditioned data;
creating an analytics environment in which the executing, evaluating and conditioning steps are performed, the analytics environment comprising parameters including at least a capacity and a bandwidth of the analytics environment; and
dynamically changing the parameters in response to the refining step to include parameters to perform additional executing and evaluating steps on refinements of the at least one model;
wherein the step of dynamically changing the parameters is performed such that the data analytics lifecycle is configured to continue from a point in the lifecycle where the parameters were changed;
wherein the execution of the at least one model is performed prior to implementation of the data analytic plan in a destination environment;
wherein the training data set is used to train the at least one model and the test data set is used to determine the accuracy of the at least one model fitted on the training data set; and
wherein the defining, obtaining, executing, refining, conditioning, creating and dynamically changing steps are performed on one or more processing elements associated with a computing system and automate at least part of the data analytics lifecycle.
9 Assignments
0 Petitions
Accused Products
Abstract
A data analytic plan is defined for analyzing a given data set associated with a given data problem. A test data set and a training data set are obtained from the given data set associated with the given data problem. At least one model is executed to confirm an adequacy of the at least one model for the data analytic plan by fitting the at least one model on the training data set and evaluating the at least one model fitted on the training data set against the test data set. The defining, obtaining and executing steps are performed on one or more processing elements associated with a computing system and automate at least part of a data analytics lifecycle.
-
Citations
18 Claims
-
1. A method comprising:
-
defining a data analytic plan for analyzing a given data set associated with a given data problem associated with a data analytics lifecycle; obtaining a test data set and a training data set from the given data set associated with the given data problem; executing at least one model to confirm an adequacy of the at least one model for the data analytic plan by fitting the at least one model on the training data set and evaluating the at least one model fitted on the training data set against the test data set, wherein the evaluation comprises assessing a validity of the at least one model and a validity of results of the execution of the at least one model on the test data set; refining the at least one model based on the assessment; conditioning at least a portion of raw data in the given data set to generate conditioned data; creating an analytics environment in which the executing, evaluating and conditioning steps are performed, the analytics environment comprising parameters including at least a capacity and a bandwidth of the analytics environment; and dynamically changing the parameters in response to the refining step to include parameters to perform additional executing and evaluating steps on refinements of the at least one model; wherein the step of dynamically changing the parameters is performed such that the data analytics lifecycle is configured to continue from a point in the lifecycle where the parameters were changed; wherein the execution of the at least one model is performed prior to implementation of the data analytic plan in a destination environment; wherein the training data set is used to train the at least one model and the test data set is used to determine the accuracy of the at least one model fitted on the training data set; and wherein the defining, obtaining, executing, refining, conditioning, creating and dynamically changing steps are performed on one or more processing elements associated with a computing system and automate at least part of the data analytics lifecycle. - View Dependent Claims (4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
2. An apparatus comprising:
-
a memory; and at least one processor operatively coupled to the memory and configured to; define a data analytic plan for analyzing a given data set associated with a given data problem associated with a data analytics lifecycle; obtain a test data set and a training data set from the given data set associated with the given data problem; execute the at least one model to confirm an adequacy of the at least one model for the data analytic plan by fitting the at least one model on the training data set and evaluate the at least one model fitted on the training data set against the test data set, wherein the evaluation comprises assessing a validity of the at least one model and a validity of results of the execution of the at least one model on the test data set; refine the at least one model based on the assessment; condition at least a portion of raw data in the given data set to generate conditioned data; create an analytics environment in which the executing, evaluating and conditioning operations are performed, the analytics environment comprising parameters including at least a capacity and a bandwidth of the analytics environment; and dynamically change the parameters in response to the refining operation to include parameters to perform additional executing and evaluating operations on refinements of the at least one model; wherein the operation of dynamically changing the parameters is performed such that the data analytics lifecycle is configured to continue from a point in the lifecycle where the parameters were changed; wherein the execution of the at least one model is performed prior to implementation of the data analytic plan in a destination environment; wherein the training data set is used to train the at least one model and the test data set is used to determine the accuracy of the at least one model fitted on the training data set; and wherein the defining, obtaining, executing, refining, conditioning, creating and dynamically changing operations automate at least part of the data analytics lifecycle.
-
-
3. A system comprising:
-
a discovery phase module for defining a data analytic plan for analyzing a given data set associated with a given data problem associated with a data analytics lifecycle; a model building phase module for obtaining a test data set and a training data set from the given data set associated with the given data problem, executing the at least one model to confirm an adequacy of the at least one model for the data analytic plan by fitting the at least one model on the training data set and evaluating the at least one model fitted on the training data set against the test data set, wherein the evaluation comprises assessing a validity of the at least one model and a validity of results of the execution of the at least one model on the test data set, refining the at least one model based on the assessment; and a data preparation phase module for conditioning at least a portion of raw data in the given data set to generate conditioned data, and for creating an analytics environment in which the executing, evaluating and conditioning operations are performed, the analytics environment comprising parameters including at least a capacity and a bandwidth of the analytics environment; wherein the parameters are dynamically changed in response to the refining operation to include parameters to perform additional executing and evaluating operations on refinements of the at least one model; wherein the operation of dynamically changing the parameters is performed such that the data analytics lifecycle is configured to continue from a point in the lifecycle where the parameters were changed; wherein the execution of the at least one model is performed prior to implementation of the data analytic plan in a destination environment; wherein the training data set is used to train the at least one model and the test data set is used to determine the accuracy of the at least one model fitted on the training data set; and wherein the discovery phase module, the model building phase module, and the data preparation phase module are implemented on one or more processing elements associated with a computing system so as to automate at least part of the data analytics lifecycle.
-
Specification