PARALLEL DATA COMPUTING OPTIMIZATION
First Claim
1. A computer-readable medium storing computer-executable instructions that, when executed, cause one or more processors to perform acts comprising:
- generating a first execution plan for a job that includes a plurality of tasks;
collecting statistics regarding operations performed in the plurality of tasks while the plurality of tasks are executed via parallel distributed execution;
generating a second execution plan for a recurring job, the second execution plan having at least one task in common with the first execution plan for the job; and
optimizing the second execution plan based at least on the statistics to produce an optimized execution plan.
2 Assignments
0 Petitions
Accused Products
Abstract
The use of statistics collected during the parallel distributed execution of the tasks of a job may be used to optimize the performance of the task or similar recurring tasks. An execution plan for a job is initially generated, in which the execution plan includes tasks. Statistics regarding operations performed in the tasks are collected while the tasks are executed via parallel distributed execution. Another execution plan is then generated for another recurring job, in which the additional execution plan has at least one task in common with the execution plan for the job. The additional execution plan is subsequently optimized based at least on the statistics to produce an optimized execution plan.
-
Citations
20 Claims
-
1. A computer-readable medium storing computer-executable instructions that, when executed, cause one or more processors to perform acts comprising:
-
generating a first execution plan for a job that includes a plurality of tasks; collecting statistics regarding operations performed in the plurality of tasks while the plurality of tasks are executed via parallel distributed execution; generating a second execution plan for a recurring job, the second execution plan having at least one task in common with the first execution plan for the job; and optimizing the second execution plan based at least on the statistics to produce an optimized execution plan. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer-implemented method, comprising:
-
generating a first execution plan for a job that includes a plurality of tasks; collecting statistics regarding operations performed in the plurality of tasks while the plurality of tasks are executed via parallel distributed execution; optimizing a second execution plan for a recurring job based at least on the statistics to produce an optimized execution plan, the second execution plan having at least one task in common with the first execution plan; executing a plurality of further tasks specified in the optimized execution plan to generate execution results for the recurring job; and outputting the execution results of the recurring job. - View Dependent Claims (13, 14, 15, 16)
-
-
17. A computing device, comprising:
-
one or more processors; and a memory that includes a plurality of computer-executable components executed by the one or more processors, comprising; a compiler that generates a first execution plan for a job that includes a plurality of tasks; at least one job manager that collects statistics regarding operations performed in the plurality of tasks while the plurality of tasks are executed via parallel distributed execution; and an compile-time optimizer that optimizes a second execution plan based at least on the statistics to produce an optimized execution plan, the second execution plan having at least one task in common with the first execution plan. - View Dependent Claims (18, 19, 20)
-
Specification