Framework for calculating grouped optimization algorithms within a distributed data store

US 9,324,036 B1
Filed: 06/29/2013
Issued: 04/26/2016
Est. Priority Date: 06/29/2013
Status: Active Grant

First Claim

Patent Images

1. A method of analyzing data within a distributed database having a plurality of database segments, comprising:

grouping, using a grouping process running within the database, instances of data into a one or more groups such that each group comprises data instances having one or more common attribute values that characterize the group, wherein the instances of data are grouped into the one or more groups without the data instances being redistributed;

running a first iteration of an analytic algorithm within the database on each of the one or more groups to generate a predictive model for each group;

running subsequent iterations of said analytic algorithm on each group using as an input model for each subsequent iteration for each group the predictive model for such group generated by a preceding iteration; and

updating in said database said predictive model generated by the preceding iteration with results of said analytic algorithm generated by said subsequent iteration.

View all claims

10 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A framework for executing iterative grouped optimization algorithms such as machine learning and other analytic algorithms directly on unsorted data within a SQL data store without first redistributing the data comprises an architecture that provides C++ abstraction layers that include the algorithms over a SQL data store, and a higher Python abstraction layer that includes grouping and iteration controllers and call functionality to the C++ layer for invocation of the algorithms.

13 Citations

View as Search Results

20 Claims

1. A method of analyzing data within a distributed database having a plurality of database segments, comprising:
- grouping, using a grouping process running within the database, instances of data into a one or more groups such that each group comprises data instances having one or more common attribute values that characterize the group, wherein the instances of data are grouped into the one or more groups without the data instances being redistributed;
  
  running a first iteration of an analytic algorithm within the database on each of the one or more groups to generate a predictive model for each group;
  
  running subsequent iterations of said analytic algorithm on each group using as an input model for each subsequent iteration for each group the predictive model for such group generated by a preceding iteration; and
  
  updating in said database said predictive model generated by the preceding iteration with results of said analytic algorithm generated by said subsequent iteration.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein said grouping process comprises grouping said data groups without redistributing the data instances by running a GROUPBY query language operation for each group directly on data instances of a database segment.
  - 3. The method of claim 2, wherein said data instances comprise rows of table data in said database, and said attribute values that characterize said groups comprise one or more columns of said table data in each data instance.
  - 4. The method of claim 1, wherein said analytic algorithm comprise an iterative machine learning algorithm that learns from said data groups to improve and update stored prediction models for each of the groups upon each iteration of the machine learning algorithm.
  - 5. The method of claim 1, wherein said running said first iteration comprises running said analytic algorithm sequentially on each group within a database segment while concurrently running said analytic algorithm on groups of all other database segments.
  - 6. The method of claim 1, wherein said running subsequent iterations comprises repeatedly running iterations of said algorithm for a predetermined number of iterations or until the models for each group converge, wherein each subsequent iteration runs said algorithm on said data groups using said updated predictive model from said preceding iteration.
  - 7. The method of claim 6 further comprising, upon said iterations reaching said predetermined number or upon said converging, aggregating and returning all grouped predictive models to a user.
  - 8. The method of claim 6 further comprising storing within the database the predictive models for each of said one or more groups from said first iteration, and updating said stored models using the results of said subsequent iterations.
  - 9. The method of claim 1, wherein said analytic algorithm is implemented in a first application program within said database system, and said analytic algorithm is invoked and iterated by a second application program within said database system that controls said first application program.

10. A computer program product comprising a non-transitory computer readable medium storing executable instructions for controlling the operation of a computer in a distributed database having a plurality of database segments to perform a method comprising:
- grouping, using a grouping process running within the database, instances of data into a one or more groups such that each group comprises data instances having one or more common attribute values that characterize the group, wherein the instances of data are grouped into the one or more groups without the data instances being redistributed;
  
  running a first iteration of an analytic algorithm within the database on each of the one or more groups to generate a predictive model for each group;
  
  running subsequent iterations of said analytic algorithm on each group using as an input model for each subsequent iteration for each group the predictive model for such group generated by a preceding iteration; and
  
  updating in said database said predictive model generated by the preceding iteration with results of said analytic algorithm generated by said subsequent iteration.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 19, 20)
- - 11. The computer program product of claim 10, wherein said grouping process comprises grouping said data groups without redistributing the data instances by running a GROUPBY query language operation for each group directly on data instances of a database segment.
  - 12. The computer program product of claim 10, wherein said analytic algorithm comprise an iterative machine learning algorithm that learns from said data groups to improve the prediction models associated with each of the groups upon each iteration of the machine learning algorithm.
  - 13. The computer program product of claim 10, wherein said running said first iteration comprises running said analytic algorithm sequentially on each group within a database segment while concurrently running said analytic algorithm on groups of all other database segments.
  - 14. The computer program product of claim 10, wherein said running subsequent iterations comprises repeatedly running iterations of said algorithm for a predetermined number of iterations or until the models for each group converge, wherein each iteration runs said algorithm on said data groups using said updated predictive model from said preceding iteration.
  - 15. The computer program product of claim 14, further comprising, upon said iterations reaching said predetermined number or upon said converging, aggregating and returning all grouped predictive models to a user.
  - 16. The computer program product of claim 14, further comprising storing within the database the predictive models for each of said one or more groups from said first iteration, and updating said stored models using the results of said subsequent iterations.
  - 17. The computer program product of claim 10, wherein said analytic algorithm comprises first application program instructions running on said computer in said database system, and said analytic algorithm is invoked and iterated by second application program layer instructions running on said computer in said database system, the second application program controlling said first application program.
  - 19. The computer system of claim 17, wherein said data instances comprise rows of table data in said database, and said attribute values that characterize said groups comprise one or more columns of said table data in each data instance.
  - 20. The computer system of claim 17, wherein said analytical algorithm is one of a plurality of iterative machine learning algorithms in said first abstraction layer, and said second executable instructions interface with said first executable instructions to execute said analytic algorithm on said data instances.

18. A computer system for a distributed database having a plurality of database segments, comprising:
- at least one processor configured to generate an aggregated prediction model based on data stored in the distributed database by executing one or more instructions provided in a database management system, a first abstraction layer, and a second abstraction layer,wherein the database management system embodies first executable instructions for controlling the operation of the computer system to group data instances in the database into one or more groups without redistributing the data instances, the groups having one or more common attribute values that characterize the groups;
  
  the first abstraction layer arranged over said database management system, the first abstraction layer embodying second executable instructions providing an analytic algorithm and for controlling the database management system to run said algorithm sequentially directly on data instances of each of said groups of a segment; and
  
  the second abstraction layer arranged over said first abstraction layer, the second abstraction layer embodying third executable instructions providing an iteration controller that controls said first abstraction layer to run iteratively said analytic algorithm directly on the data instances in each of said groups using a predictive model for each iteration of the algorithm on a group that results from a prior iteration of the algorithm on said group, the iteration controller stopping said iterative analysis after a predetermined number of iterations or upon the prediction models for the groups converging, and upon said stopping said second abstraction layer aggregating all of said predictive models for said groups from said plurality of segments and providing the aggregated prediction models to a user, anda memory coupled to the at least one processor and configured to provide the at least one processor with instructions.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Emc IP Holding Company LLC (Dell Technologies Inc.), Vertex Pharmaceuticals Incorporated
Original Assignee
EMC Corporation (Dell Technologies Inc.)
Inventors
Iyer, Rahul, Qian, Hai, Yang, Shengwen, Welton, Caleb E.
Primary Examiner(s)
Hill, Stanley K
Assistant Examiner(s)
Chubb, Mikayla

Application Number

US13/931,876
Time in Patent Office

1,032 Days
Field of Search

None
US Class Current

1/1
CPC Class Codes

G06F 16/244   Grouping and aggregation

G06F 16/2471   Distributed queries

G06F 16/25   Integrating or interfacing ...

G06F 16/285   Clustering or classification

G06N 20/00   Machine learning

G06N 7/01   Probabilistic graphical mod...

Framework for calculating grouped optimization algorithms within a distributed data store

First Claim

10 Assignments

0 Petitions

Accused Products

Abstract

13 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Framework for calculating grouped optimization algorithms within a distributed data store

First Claim

10 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

13 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links