Managing parallel execution of work granules according to their affinity
First Claim
1. A method for breaking a task into work granules to assign to processes, the method comprising the steps of:
- determining how many processes will be used to execute said task;
determining how many granules to divide said task into based on how many processes will be used to execute said task, and a range defined by a first threshold and a second threshold;
wherein the first threshold is a minimum number of work granules to assign to each of the processes that will be used to execute said task;
wherein the second threshold is a maximum number of work granules to assign to each of the processes that will be used to execute said task; and
dividing said task into a number of work granules that allows each process that will be used to execute said task to be assigned a number of work granules that falls within said range.
0 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus are provided for managing work granules being executed in parallel. A task is evenly divided between a number of work granules. The number of work granules falls between a threshold minimum and a threshold maximum. The threshold minimum and maximum may be configured to balance a variety of efficiency factors affected by the number of work granules, including workload skew and overhead incurred in managing larger number of work granules. Work granules are distributed to processes on nodes according to which of the nodes, if any, may execute the work granule efficiently. A variety of factors may used to determine where a work granule may be performed efficiently, including whether data accessed during the execution of a work granule may be locally accessed by a node.
-
Citations
16 Claims
-
1. A method for breaking a task into work granules to assign to processes, the method comprising the steps of:
-
determining how many processes will be used to execute said task; determining how many granules to divide said task into based on how many processes will be used to execute said task, and a range defined by a first threshold and a second threshold; wherein the first threshold is a minimum number of work granules to assign to each of the processes that will be used to execute said task; wherein the second threshold is a maximum number of work granules to assign to each of the processes that will be used to execute said task; and dividing said task into a number of work granules that allows each process that will be used to execute said task to be assigned a number of work granules that falls within said range. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-readable storage medium storing one or more sequences of one or more instructions for managing the assignment of a plurality of work granules to a plurality of processes on a plurality of nodes, the one or more sequences of one or more instructions including instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of:
-
making a determination that; a portion of a first work granule of said plurality of work granules has an affinity to a particular node of said plurality of nodes, and another portion of said first work granule has an affinity to another node of said plurality of nodes; based on said determination, generating first data that establishes said first work granule as having an affinity to a particular node of said plurality of nodes; assigning each work granule from the plurality of work granules to a process from said plurality of processes based on said first data; wherein a portion of each work granule of said plurality of work granules has an affinity for a given node of said plurality of nodes when the portion of said each work granule can be executed more efficiently by said given node relative to another node of said plurality of nodes; and wherein the step of generating first data includes generating second data that establishes as not having an affinity for any particular node of said plurality of nodes a second work granule that includes a portion of work which has an affinity for a node from said plurality of nodes, and another portion of work which has an affinity for another node from said plurality of nodes. - View Dependent Claims (12)
-
-
13. A computer-readable storage medium storing one or more sequences of one or more instructions for managing the assignment of a plurality of work granules to a plurality of processes on a plurality of nodes, the one or more sequences of one or more instructions including instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of:
-
making a determination that; a portion of a first work granule of said plurality of work granules has an affinity to a particular node of said plurality of nodes, and another portion of said first work granule has an affinity to another node of said plurality of nodes; based on said determination, generating first data that establishes said first work granule as having an affinity to a particular node of said plurality of nodes; assigning each work granule from the plurality of work granules to a process from said plurality of processes based on said first data; wherein a portion of each work granule of said plurality of work granules has an affinity for a given node of said plurality of nodes when the portion of said each work granule can be executed more efficiently by said given node relative to another node of said plurality of nodes; and wherein the step of generating first data includes generating first data that establishes as having an affinity to a particular node from said plurality of nodes a particular work granule that specifies a plurality of database table partitions to scan.
-
-
14. A computer-readable storage medium storing one or more sequences of one or more instructions for managing the assignment of a plurality of work granules to a plurality of processes on a plurality of nodes, the one or more sequences of one or more instructions including instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of:
-
making a determination that; a portion of a first work granule of said plurality of work granules has an affinity to a particular node of said plurality of nodes, and another portion of said first work granule has an affinity to another node of said plurality of nodes; based on said determination, generating first data that establishes said first work granule as having an affinity to a particular node of said plurality of nodes; assigning each work granule from the plurality of work granules to a process from said plurality of processes based on said first data; wherein a portion of each work granule of said plurality of work granules has an affinity for a given node of said plurality of nodes when the portion of said each work granule can be executed more efficiently by said given node relative to another node of said plurality of nodes; determining that a quantity of certain data is accessed during execution of said first work granule; determining that a majority of said quantity of said certain data resides at a location local to said first node; and wherein the step of establishing said first work granule as having an affinity to said particular node is performed in response to determining that a majority of said quantity of third data resides at a location local to said first node.
-
-
15. A computer-readable storage medium storing one or more sequences of one or more instructions for assigning work granules to a plurality of processes on a plurality of nodes, the one or more sequences of one or more instructions including instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of:
-
dividing a task into a plurality of work granules that includes a first set of work granules that each define work by specifying one or more ranges of blocks to scan, and a second set of work granules that each define work by specifying at least two database table partitions of a database table to scan; generating first data that specifies, for each work granule of said plurality of work granules, whether said each work granule has an affinity for a particular node of said plurality of nodes; assigning said plurality of work granules to said plurality of processes based on said first data; wherein said database table is partitioned into said at least two database table partitions by values in one or more columns of said database table; and wherein the step of generating first data includes generating for each of said first set and said second set; at least one list that includes work granules having an affinity for a particular node from said plurality of nodes; and another list that includes work granules having no particular affinity for a particular node. - View Dependent Claims (16)
-
Specification