TQ distribution that increases parallism by distributing one slave to a particular data block
First Claim
Patent Images
1. A method, the method comprising the computer-implemented steps of:
- assigning a first plurality of slaves and a second plurality of slaves to participate in execution of a distributed operation, wherein the distributed operation involves accessing base rows that are contained in at least one table and that are stored in a plurality of data blocks;
wherein said first plurality of slaves generates output rows for processing by said second plurality of slaves;
wherein said generated output rows contain data from said accessed base rows;
generating a data structure that indicates associations of said second plurality of slaves with said plurality of data blocks;
distributing said generated output rows to said second plurality of slaves based on;
particular data blocks that contain the accessed base rows of the generated output rows; and
the associations of said second plurality of slaves with said plurality of data blocks;
wherein a first slave of said first plurality of slaves produces a first output row having a first base row from a certain data block of said plurality of data blocks;
wherein a second slave of said first plurality of slaves produces a second output row having a second base row from said certain data block of said plurality of data blocks; and
wherein distributing said output rows includes;
assigning, based on the generated data structure and said certain data block containing said first base row, said first output row to a certain slave of said second plurality of slaves that is associated with said certain data block; and
assigning, based on the generated data structure and said certain data block containing said second base row, said second output row to said certain slave of said second plurality of slaves that is associated with said certain data block.
1 Assignment
0 Petitions
Accused Products
Abstract
Provided herein are techniques that may be used to dramatically increase parallism for distributed DML operations. The work of distributed DML operations are distributed in a way that avoids self-dead locks, by ensuring that, for a given data block, no more than one slave is assigned to modify a row that is wholly contained by the data block or whose head row piece is contained by the data block. Assigning slaves in this way not only allows more slaves to be assigned to modify a partition, but allows for greater flexibility in load balancing.
71 Citations
20 Claims
-
1. A method, the method comprising the computer-implemented steps of:
-
assigning a first plurality of slaves and a second plurality of slaves to participate in execution of a distributed operation, wherein the distributed operation involves accessing base rows that are contained in at least one table and that are stored in a plurality of data blocks; wherein said first plurality of slaves generates output rows for processing by said second plurality of slaves; wherein said generated output rows contain data from said accessed base rows; generating a data structure that indicates associations of said second plurality of slaves with said plurality of data blocks; distributing said generated output rows to said second plurality of slaves based on; particular data blocks that contain the accessed base rows of the generated output rows; and the associations of said second plurality of slaves with said plurality of data blocks; wherein a first slave of said first plurality of slaves produces a first output row having a first base row from a certain data block of said plurality of data blocks; wherein a second slave of said first plurality of slaves produces a second output row having a second base row from said certain data block of said plurality of data blocks; and wherein distributing said output rows includes; assigning, based on the generated data structure and said certain data block containing said first base row, said first output row to a certain slave of said second plurality of slaves that is associated with said certain data block; and assigning, based on the generated data structure and said certain data block containing said second base row, said second output row to said certain slave of said second plurality of slaves that is associated with said certain data block. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-readable storage medium storing one or more sequences of instructions for executing distributed operations, wherein execution of the one or more sequences of instructions by one or more processors causes the one or more processors to perform the steps of:
-
assigning a first plurality of slaves and a second plurality of slaves to participate in execution of a distributed operation, wherein the distributed operation involves accessing base rows that are contained in at least one table and that are stored in a plurality of data blocks; wherein said first plurality of slaves generates output rows for processing by said second plurality of slaves; wherein said generated output rows contain data from said base rows; generating a data structure that indicates associations of said second plurality of slaves with said plurality of data blocks; distributing said generated output rows to said second plurality of slaves based on; particular data blocks that contain the accessed base rows of the generated output rows; and the associations of said second plurality of slaves with said plurality of data blocks; wherein a first slave of said first plurality of slaves produces a first output row having a first base row from a certain data block of said plurality of data blocks; wherein a second slave of said first plurality of slaves produces a second output row having a second base row from said certain data block of said plurality of data blocks; and wherein distributing said output rows includes; assigning, based on the generated data structure and said certain data block containing said first base row, said first output row to a certain slave of said second plurality of slaves that is associated with said certain data block; and assigning, based on the generated data structure and said certain data block containing said second base row, said second output row to said certain slave of said second plurality of slaves that is associated with said certain data block. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification