Dataset discovery in data analytics
First Claim
Patent Images
1. A method comprising:
- obtaining an initial work package defining at least one hypothesis associated with a given data problem, the initial work package being generated in accordance with one or more phases of an automated data analytics lifecycle;
identifying a plurality of datasets;
discovering one or more datasets in the plurality of datasets that are relevant to the at least one hypothesis, wherein discovery comprises performing data mining on the plurality of datasets; and
testing the at least one hypothesis using at least a portion of the one or more discovered datasets;
wherein the obtaining, identifying, discovering, and testing steps are performed on one or more processing elements associated with a computing system.
9 Assignments
0 Petitions
Accused Products
Abstract
An initial work package is obtained. The initial work package defines at least one hypothesis associated with a given data problem, and is generated in accordance with one or more phases of an automated data analytics lifecycle. A plurality of datasets is identified. One or more datasets in the plurality of datasets that are relevant to the at least one hypothesis are discovered. The at least one hypothesis is tested using at least a portion of the one or more discovered datasets.
37 Citations
20 Claims
-
1. A method comprising:
-
obtaining an initial work package defining at least one hypothesis associated with a given data problem, the initial work package being generated in accordance with one or more phases of an automated data analytics lifecycle; identifying a plurality of datasets; discovering one or more datasets in the plurality of datasets that are relevant to the at least one hypothesis, wherein discovery comprises performing data mining on the plurality of datasets; and testing the at least one hypothesis using at least a portion of the one or more discovered datasets; wherein the obtaining, identifying, discovering, and testing steps are performed on one or more processing elements associated with a computing system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A computer program product comprising a processor-readable storage medium having encoded therein executable code of one or more software programs, wherein the one or more software programs when executed by one or more processing elements of the computing system implement the steps of:
-
obtaining an initial work package defining at least one hypothesis associated with a given data problem, the initial work package being generated in accordance with one or more phases of an automated data analytics lifecycle; identifying a plurality of datasets; discovering one or more datasets in the plurality of datasets that are relevant to the at least one hypothesis, wherein discovery comprises performing data mining on the plurality of datasets; and testing the at least one hypothesis using at least a portion of the one or more discovered datasets.
-
-
15. An apparatus comprising:
-
a memory; and at least one processor operatively coupled to the memory and configured to;
obtain an initial work package defining at least one hypothesis associated with a given data problem, the initial work package being generated in accordance with one or more phases of an automated data analytics lifecycle;
identify a plurality of datasets;
discover one or more datasets in the plurality of datasets that are relevant to the at least one hypothesis, wherein discovery comprises performing data mining on the plurality of datasets; and
test the at least one hypothesis using at least a portion of the one or more discovered datasets. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification