Framework for calculating grouped optimization algorithms within a distributed data store
First Claim
Patent Images
1. A method of analyzing data within a distributed database having a plurality of database segments, comprising:
- grouping, using a grouping process running within the database, instances of data into a one or more groups such that each group comprises data instances having one or more common attribute values that characterize the group, wherein the instances of data are grouped into the one or more groups without the data instances being redistributed;
running a first iteration of an analytic algorithm within the database on each of the one or more groups to generate a predictive model for each group;
running subsequent iterations of said analytic algorithm on each group using as an input model for each subsequent iteration for each group the predictive model for such group generated by a preceding iteration; and
updating in said database said predictive model generated by the preceding iteration with results of said analytic algorithm generated by said subsequent iteration.
10 Assignments
0 Petitions
Accused Products
Abstract
A framework for executing iterative grouped optimization algorithms such as machine learning and other analytic algorithms directly on unsorted data within a SQL data store without first redistributing the data comprises an architecture that provides C++ abstraction layers that include the algorithms over a SQL data store, and a higher Python abstraction layer that includes grouping and iteration controllers and call functionality to the C++ layer for invocation of the algorithms.
13 Citations
20 Claims
-
1. A method of analyzing data within a distributed database having a plurality of database segments, comprising:
-
grouping, using a grouping process running within the database, instances of data into a one or more groups such that each group comprises data instances having one or more common attribute values that characterize the group, wherein the instances of data are grouped into the one or more groups without the data instances being redistributed; running a first iteration of an analytic algorithm within the database on each of the one or more groups to generate a predictive model for each group; running subsequent iterations of said analytic algorithm on each group using as an input model for each subsequent iteration for each group the predictive model for such group generated by a preceding iteration; and updating in said database said predictive model generated by the preceding iteration with results of said analytic algorithm generated by said subsequent iteration. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer program product comprising a non-transitory computer readable medium storing executable instructions for controlling the operation of a computer in a distributed database having a plurality of database segments to perform a method comprising:
-
grouping, using a grouping process running within the database, instances of data into a one or more groups such that each group comprises data instances having one or more common attribute values that characterize the group, wherein the instances of data are grouped into the one or more groups without the data instances being redistributed; running a first iteration of an analytic algorithm within the database on each of the one or more groups to generate a predictive model for each group; running subsequent iterations of said analytic algorithm on each group using as an input model for each subsequent iteration for each group the predictive model for such group generated by a preceding iteration; and updating in said database said predictive model generated by the preceding iteration with results of said analytic algorithm generated by said subsequent iteration. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 19, 20)
-
-
18. A computer system for a distributed database having a plurality of database segments, comprising:
-
at least one processor configured to generate an aggregated prediction model based on data stored in the distributed database by executing one or more instructions provided in a database management system, a first abstraction layer, and a second abstraction layer, wherein the database management system embodies first executable instructions for controlling the operation of the computer system to group data instances in the database into one or more groups without redistributing the data instances, the groups having one or more common attribute values that characterize the groups; the first abstraction layer arranged over said database management system, the first abstraction layer embodying second executable instructions providing an analytic algorithm and for controlling the database management system to run said algorithm sequentially directly on data instances of each of said groups of a segment; and the second abstraction layer arranged over said first abstraction layer, the second abstraction layer embodying third executable instructions providing an iteration controller that controls said first abstraction layer to run iteratively said analytic algorithm directly on the data instances in each of said groups using a predictive model for each iteration of the algorithm on a group that results from a prior iteration of the algorithm on said group, the iteration controller stopping said iterative analysis after a predetermined number of iterations or upon the prediction models for the groups converging, and upon said stopping said second abstraction layer aggregating all of said predictive models for said groups from said plurality of segments and providing the aggregated prediction models to a user, and a memory coupled to the at least one processor and configured to provide the at least one processor with instructions.
-
Specification