Data analytics lifecycle automation
First Claim
1. A method comprising:
- defining an initial data analytic plan for analyzing a given data set associated with a given data problem;
conditioning at least a portion of original data in the given data set to generate conditioned data;
selecting at least one model to analyze at least one of the original data and the conditioned data;
executing the at least one selected model on at least one of a portion of the original data and a portion of the conditioned data to confirm an adequacy of the at least one selected model;
communicating results of the model execution to at least one entity, the results comprising a refined data analytic plan for analyzing the given data set; and
provisioning, via generating and deploying, one or more computing resources to implement the refined data analytic plan;
wherein the defining, conditioning, selecting, executing, communicating and provisioning steps correspond to respective phases of a data analytics lifecycle;
wherein the method further comprises;
providing, to a user during multiple ones of the respective phases prior to the phase corresponding to the provisioning, an inventory of one or more computing resources to implement the at least one selected model; and
changing, by the user during one or more of the respective phases prior to the phase corresponding to the provisioning, the at least one selected model based on the provided inventory;
wherein the step of conditioning at least a portion of original data in the given data set to generate conditioned data comprises creating a separate analytics environment used to condition the portion of the original data, the separate analytics environment is created to have a capacity that is a selectable multiple of a capacity associated with the original data in the given data set, and one of one or more selectable multiples of the capacity are selected by the user between any two of the respective phases to dynamically provision and adjust the capacity of the separate analytics environment;
wherein, in response to the user returning to at least one previous phase of the data analytics lifecycle and altering the previous phase, one or more subsequent phases of the data analytics lifecycle are automatically updated based on the user-altered previous phase; and
wherein the defining, conditioning, selecting, executing, communicating, provisioning, providing, and changing steps are performed on one or more processing elements associated with a computing system and automate the data analytics lifecycle.
9 Assignments
0 Petitions
Accused Products
Abstract
An initial data analytic plan for analyzing a given data set associated with a given data problem is defined. At least a portion of original data in the given data set is conditioned to generate conditioned data. At least one model is selected to analyze at least one of the original data and the conditioned data. The at least one selected model is executed on at least one of a portion of the original data and a portion of the conditioned data. Results of the model execution are communicated to at least one entity, the results comprising a refined data analytic plan for analyzing the given data set. One or more computing resources are provisioned to implement the refined data analytic plan. The defining, conditioning, selecting, executing, communicating and provisioning steps are performed on one or more processing elements associated with a computing system and automate a data analytics lifecycle.
-
Citations
19 Claims
-
1. A method comprising:
-
defining an initial data analytic plan for analyzing a given data set associated with a given data problem; conditioning at least a portion of original data in the given data set to generate conditioned data; selecting at least one model to analyze at least one of the original data and the conditioned data; executing the at least one selected model on at least one of a portion of the original data and a portion of the conditioned data to confirm an adequacy of the at least one selected model; communicating results of the model execution to at least one entity, the results comprising a refined data analytic plan for analyzing the given data set; and provisioning, via generating and deploying, one or more computing resources to implement the refined data analytic plan; wherein the defining, conditioning, selecting, executing, communicating and provisioning steps correspond to respective phases of a data analytics lifecycle; wherein the method further comprises; providing, to a user during multiple ones of the respective phases prior to the phase corresponding to the provisioning, an inventory of one or more computing resources to implement the at least one selected model; and changing, by the user during one or more of the respective phases prior to the phase corresponding to the provisioning, the at least one selected model based on the provided inventory; wherein the step of conditioning at least a portion of original data in the given data set to generate conditioned data comprises creating a separate analytics environment used to condition the portion of the original data, the separate analytics environment is created to have a capacity that is a selectable multiple of a capacity associated with the original data in the given data set, and one of one or more selectable multiples of the capacity are selected by the user between any two of the respective phases to dynamically provision and adjust the capacity of the separate analytics environment; wherein, in response to the user returning to at least one previous phase of the data analytics lifecycle and altering the previous phase, one or more subsequent phases of the data analytics lifecycle are automatically updated based on the user-altered previous phase; and wherein the defining, conditioning, selecting, executing, communicating, provisioning, providing, and changing steps are performed on one or more processing elements associated with a computing system and automate the data analytics lifecycle. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. An apparatus comprising:
-
a memory; and at least one processor operatively coupled to the memory and configured to;
define an initial data analytic plan for analyzing a given data set associated with a given data problem;
condition at least a portion of original data in the given data set to generate conditioned data;
select at least one model to analyze at least one of the original data and the conditioned data;
execute the at least one selected model on at least one of a portion of the original data and a portion of the conditioned data to confirm an adequacy of the at least one selected model;
communicate results of the model execution to at least one entity, the results comprising a refined data analytic plan for analyzing the given data set; and
provision, via generating and deploying, one or more computing resources to implement the refined data analytic plan;wherein the defining, conditioning, selecting, executing, communicating and provisioning operations automate and correspond to respective phases of a data analytics lifecycle; and wherein the at least one processor is further configured to; provide, to a user during multiple ones of the respective phases prior to the phase corresponding to the provisioning, an inventory of one or more computing resources to implement the at least one selected model; and change, based on input from the user, during one or more of the respective phases prior to the phase corresponding to the provisioning, the at least one selected model based on the provided inventory; wherein conditioning at least a portion of original data in the given data set to generate conditioned data comprises creating a separate analytics environment used to condition the portion of the original data, the separate analytics environment is created to have a capacity that is a selectable multiple of a capacity associated with the original data in the given data set, and one of one or more selectable multiples of the capacity are selected by the user between any two of the respective phases to dynamically provision and adjust the capacity of the separate analytics environment; and wherein, in response to the user returning to at least one previous phase of the data analytics lifecycle and altering the previous phase, one or more subsequent phases of the data analytics lifecycle are automatically updated based on the user-altered previous phase. - View Dependent Claims (13, 14, 15, 16, 17, 18)
-
-
19. A system comprising:
-
a memory; one or more processing elements operatively coupled to the memory; a graphical user interface; a discovery phase module for defining an initial data analytic plan for analyzing a given data set associated with a given data problem; a data preparation phase module for conditioning at least a portion of original data in the given data set to generate conditioned data; a model planning phase module for selecting at least one model to analyze at least one of the original data and the conditioned data; a model building phase module for executing the at least one selected model on at least one of a portion of the original data and a portion of the conditioned data to confirm an adequacy of the at least one selected model; a results communication phase module for communicating results of the model execution to at least one entity, the results comprising a refined data analytic plan for analyzing the given data set; and an operationalizing phase module for provisioning, via generating and deploying, one or more computing resources to implement the refined data analytic plan; wherein the discovery phase module, the data preparation phase module, the model planning phase module, the model building phase module, the results communication phase module, and the operationalizing phase module are configured to implement respective phases of a data analytics lifecycle and are implemented on the one or more processing elements associated with a computing system so as to automate the data analytics lifecycle; and wherein the system is configured to; provide to a user, via the graphical user interface and during multiple ones of the respective phases prior to the phase implemented by the operationalizing phase module, an inventory of one or more computing resources to implement the at least one selected model; and permit a user, via the graphical user interface, to change, during one or more of the respective phases prior to the phase implemented by the operationalizing phase module, the at least one selected model based on the provided inventory; wherein conditioning at least a portion of original data in the given data set to generate conditioned data comprises creating a separate analytics environment used to condition the portion of the original data, the separate analytics environment is created to have a capacity that is a selectable multiple of a capacity associated with the original data in the given data set, and one of one or more selectable multiples of the capacity are selected by the user between any two of the respective phases to dynamically provision and adjust the capacity of the separate analytics environment; and wherein, in response to the user returning to at least one previous phase of the data analytics lifecycle and altering the previous phase, one or more subsequent phases of the data analytics lifecycle are automatically updated based on the user-altered previous phase.
-
Specification