RANDOM NUMBER GENERATOR IN A PARALLEL PROCESSING DATABASE
First Claim
1. A method of generating random numbers on parallel processing segments of a massively parallel processing (MPP) database, comprising:
- generating a same random number sequence on each segment;
establishing on each segment a different starting position in the random number sequence generated on said each segment;
calculating a step size that determines return positions in the random number sequence generated on said each segment at which random numbers are returned, the step size being the same on all segments, the step size corresponding to a number of query slices and a number of segments, wherein a larger number of query slices and a larger number of segments correspond to a large step size, wherein each query slice is a partition of a relational database query; and
returning uncorrelated random numbers at said return positions in the random number sequence at said segments.
3 Assignments
0 Petitions
Accused Products
Abstract
A random number generation process generated uncorrelated random numbers from identical random number sequences on parallel processing database segments of an MPP database without communications between the segments by establishing a different starting position in the sequence on each segment using an identifier that is unique to each segment, query slice information and the number of segments. A master node dispatches a seed value to initialize the random number sequence generation on all segments, and dispatches the query slice information and information as to the number of segments, during a normal query plan dispatch process.
17 Citations
36 Claims
-
1. A method of generating random numbers on parallel processing segments of a massively parallel processing (MPP) database, comprising:
-
generating a same random number sequence on each segment; establishing on each segment a different starting position in the random number sequence generated on said each segment; calculating a step size that determines return positions in the random number sequence generated on said each segment at which random numbers are returned, the step size being the same on all segments, the step size corresponding to a number of query slices and a number of segments, wherein a larger number of query slices and a larger number of segments correspond to a large step size, wherein each query slice is a partition of a relational database query; and returning uncorrelated random numbers at said return positions in the random number sequence at said segments. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. Computer readable non-transitory storage medium product embodying instructions for controlling the operation of a computer to generate random numbers on parallel processing segments of a massively parallel processing (MPP) database, comprising instructions for:
-
generating the same random number sequence on each segment; establishing on each segment a different starting position in the random number sequence generated on said each segment; calculating a step size that determines return positions in the random number sequence generated on said each segment at which random numbers are returned, the step size being the same on all segments, the step size corresponding to a number of query slices and a number of segments, wherein a larger number of query slices and a larger number of segments correspond to a large step size, wherein each query slice is a partition of a relational database query; and returning uncorrelated random numbers at said return positions in the random number sequence at said segments. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A system comprising:
-
a plurality of processors; a non-transitory storage medium storing computer instructions operable to cause the processors to perform operations comprising; generating a same random number sequence on each segment; establishing on each segment a different starting position in the random number sequence generated on said each segment; calculating a step size that determines return positions in the random number sequence generated on said each segment at which random numbers are returned, the step size being the same on all segments, the step size corresponding to a number of query slices and a number of segments, wherein a larger number of query slices and a larger number of segments correspond to a large step size, wherein each query slice is a partition of a relational database query; and returning uncorrelated random numbers at said return positions in the random number sequence at said segments.
-
-
16. A method comprising:
-
receiving, by a computing device, a query plan, the query plan specifying that a relational database query is partitioned into one or more query slices, each query slice being a portion of operations of the query, each query slice being executable on one or more segment nodes independently of another query slice, each segment node being a node of a parallel processing database system comprising a plurality of nodes each having a processor that is independent of a processor of another node; determining a partition plan of partitioning a master random number sequence to each of the segment nodes, wherein determining the partition plan comprises determining a step size based on a count of query slices and a count of segment nodes of the parallel processing database system, wherein the step size in the partition plan usable to specify uncorrelated subsets of the master random sequence for each different combination of a query slice and a segment node, and wherein a larger count of query slices and a larger count of segment nodes correspond to a larger step size; dispatching each query slice to at least one of the segment nodes according to the partition plan, including designating a different subset of the master random sequence to each different query slice and segment node combination based on the step size and an offset, each offset corresponding to a position of a segment node in an ordered list of segment nodes and a position of the query slice in an order list of query slices; and executing each query slice, including returning a random number in response to a call to a random function in the query slice, the random number being a number in the master random number sequence selected based on the step size and offset. - View Dependent Claims (17, 18, 19, 20, 21, 22)
-
-
23. A non-transitory computer-readable medium storing instructions to cause a plurality of processors to perform operations comprising:
-
receiving a query plan, the query plan specifying that a relational database query is partitioned into one or more query slices, each query slice being a portion of operations of the query, each query slice being executable on a segment node independently of another portion of the query, the segment node being a node of a parallel processing database system comprising a plurality of nodes each having a processor that is independent of a processor of another node; determining a partition plan of partitioning a master random number sequence to each of the segment nodes, wherein determining the partition plan comprises determining a step size based on a count of query slices and a count of segment nodes of the parallel processing database system, wherein the step size in the partition plan usable to specify uncorrelated subsets of the master random sequence for each different combination of a query slice and a segment node, and wherein a larger count of query slices and a larger count of segment nodes correspond to a larger step size; dispatching each query slice to at least one of the segment nodes according to the partition plan, including designating a different subset of the master random sequence to each different query slice and segment node combination based on the step size and an offset, each offset corresponding to a position of a segment node in an ordered list of segment nodes and a position of the query slice in an order list of query slices; and executing each query slice, including returning a random number in response to a call to a random function in the query slice, the random number being a number in the master random number sequence selected based on the step size and offset. - View Dependent Claims (24, 25, 26, 27, 28, 29)
-
-
30. A system comprising:
-
a plurality of processors; and a non-transitory computer-readable medium storing instructions to cause the processors to perform operations comprising; receiving, by a computing device, a query plan, the query plan specifying that a relational database query is partitioned into one or more query slices, each query slice being a portion of operations of the query, each query slice being executable on one or more segment nodes independently of another query slice, each segment node being a node of a parallel processing database system comprising a plurality of nodes each having a processor that is independent of a processor of another node; determining a partition plan of partitioning a master random number sequence to each of the segment nodes, wherein determining the partition plan comprises determining a step size based on a count of query slices and a count of segment nodes of the parallel processing database system, wherein the step size in the partition plan usable to specify uncorrelated subsets of the master random sequence for each different combination of a query slice and a segment node, and wherein a larger count of query slices and a larger count of segment nodes correspond to a larger step size; dispatching each query slice to at least one of the segment nodes according to the partition plan, including designating a different subset of the master random sequence to each different query slice and segment node combination based on the step size and an offset, each offset corresponding to a position of a segment node in an ordered list of segment nodes and a position of the query slice in an order list of query slices; and executing each query slice, including returning a random number in response to a call to a random function in the query slice, the random number being a number in the master random number sequence selected based on the step size and offset. - View Dependent Claims (31, 32, 33, 34, 35, 36)
-
Specification