Architecture for automated data analysis
First Claim
Patent Images
1. A computerized system comprising:
- an automated problem formulation layer to receive a data set comprising a plurality of records, each record having a value for each of a plurality of raw transactional variables, and to abstract the raw transactional variables organized into a hierarchy of nodes over which a concept is defined as one of a root node of the hierarchy, a node in the hierarchy, and a set of nodes in the hierarchy having a same parent node into a plurality of cooked transactional variables, each cooked transactional variable corresponding to an unrefined concept after the hierarchy has been successively refined a number of times starting with an initial concept equal to the root node, where a refined concept is defined as a first concept refined into two second concepts such that one of the two second concepts is a most populous node of the hierarchy as measured by the plurality of records having non-zero values for a raw transactional variable corresponding to the most populous node and the two second concepts cover a same set of nodes of the hierarchy as the first concept does;
a first learning engine to generate a model for the plurality of cooked transactional variables based on the cooked transactional variables, the model having a type; and
, a second learning engine to generate a model for the plurality of raw transactional variables based on the model for the plurality of cooked transactional variables and the type of the model for the plurality of cooked transactional variables.
2 Assignments
0 Petitions
Accused Products
Abstract
An architecture for automated data analysis. In one embodiment, a computerized system comprising an automated problem formulation layer, a first learning engine, and a second learning engine. The automated problem formulation layer receives a data set. The data set has a plurality of records, where each record has a value for each of a plurality of raw transactional variables. The layer abstracts the raw transactional variables into cooked transactional variables. The first learning engine generates a model for the cooked transactional variables, while the second learning engine generates a model for the raw transactional variables.
37 Citations
28 Claims
-
1. A computerized system comprising:
-
an automated problem formulation layer to receive a data set comprising a plurality of records, each record having a value for each of a plurality of raw transactional variables, and to abstract the raw transactional variables organized into a hierarchy of nodes over which a concept is defined as one of a root node of the hierarchy, a node in the hierarchy, and a set of nodes in the hierarchy having a same parent node into a plurality of cooked transactional variables, each cooked transactional variable corresponding to an unrefined concept after the hierarchy has been successively refined a number of times starting with an initial concept equal to the root node, where a refined concept is defined as a first concept refined into two second concepts such that one of the two second concepts is a most populous node of the hierarchy as measured by the plurality of records having non-zero values for a raw transactional variable corresponding to the most populous node and the two second concepts cover a same set of nodes of the hierarchy as the first concept does;
a first learning engine to generate a model for the plurality of cooked transactional variables based on the cooked transactional variables, the model having a type; and
,a second learning engine to generate a model for the plurality of raw transactional variables based on the model for the plurality of cooked transactional variables and the type of the model for the plurality of cooked transactional variables. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer-implemented method comprising:
-
inputting a data set comprising a plurality of records, each record having a value for each of a plurality of raw transactional variables organized into a hierarchy of nodes over which a concept is defined as one of a root node of the hierarchy, a node in the hierarchy, and a set of nodes in the hierarchy having a same parent node;
abstracting the raw transactional variables into a plurality of cooked transactional variables, each cooked transactional variable corresponding to an unrefined concept after the hierarchy has been successively refined a number of times starting with an initial concept equal to the root node, where a refined concept is defined as a first concept refined into two second concepts such that one of the two second concepts is a most populous node of the hierarchy as measured by the plurality of records having non-zero values for a raw transactional variable corresponding to the most populous node and the two second concepts cover a same set of nodes of the hierarchy as the first concept does;
generating a model for the plurality of cooked transactional variables based on the cooked transactional variables, the model having a type;
generating a model for the plurality of raw transactional variables based on the model for the plurality of cooked transactional variables and the type of the model for the plurality of cooked transactional variables; and
,outputting at least one of the model for the plurality of cooked transactional variables and the model for the plurality of raw transactional variables. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
-
24. A computer-readable medium having instructions stored thereon for execution by a processor to perform a method comprising:
-
inputting a data set comprising a plurality of records, each record having a value for each of a plurality of raw transactional variables organized into a hierarchy of nodes over which a concept is defined as one of a root node of the hierarchy, a node in the hierarchy, and a set of nodes in the hierarchy having a same parent node;
abstracting the raw transactional variables into a plurality of cooked transactional variables, each cooked transactional variable corresponding to an unrefined concept after the hierarchy has been successively refined a number of times starting with an initial concept equal to the root node, where a refined concept is defined as a first concept refined into two second concepts such that one of the two second concepts is a most populous node of the hierarchy as measured by the plurality of records having non-zero values for a raw transactional variable corresponding to the most populous node and the two second concepts cover a same set of nodes of the hierarchy as the first concept does;
generating a model for the plurality of cooked transactional variables based on the cooked transactional variables, the model having a type;
generating a model for the plurality of raw transactional variables based on the model for the plurality of cooked transactional variables and the type of the model for the plurality of cooked transactional variables; and
,outputting at least one of the model for the plurality of cooked transactional variables and the model for the plurality of raw transactional variables. - View Dependent Claims (25, 26, 27, 28)
-
Specification