Reducing skew for database operation processing with randomization
First Claim
1. A computer program, stored in a non-transitory computer-readable medium, on which is recorded a computer program, the computer program comprising executable instructions, that, when executed, perform a method for performing a database operation in a file system residing on a plurality of processing modules, the file system including a first relation having a plurality of first-relation entries, each of the plurality of first-relation entries having a first-relation attribute that is of interest in the database operation, the file system including a second relation having a plurality of second-relation entries, each of the plurality of second-relation entries having a second-relation attribute that is of interest in the computational operation, the method comprising:
- setting a value of a distribution attribute in each of the first-relation entries to a unique value selected from among a domain of unique values;
redistributing the first-relation entries of the first relation among the plurality of processing modules based on the first-relation attribute and the distribution attribute;
making n copies of the second relation, where n is the number of unique values in the domain of unique values;
redistributing each of the copies of the second relation to a respective processing module to which the first-relation entries of the first relation have been redistributed by;
setting a number attribute for the second-relation entries for each respective copy of the second relation to a respective unique value selected from the domain of unique values, andredistributing each of the copies of the second relation based on the number attribute; and
performing the computational operation to produce a result.
1 Assignment
0 Petitions
Accused Products
Abstract
A database operation is performed in a file system residing on a plurality of processing modules. The file system includes a first relation having a plurality of first-relation entries. Each of the plurality of first-relation entries has a first-relation attribute that is of interest in the database operation. A value of a distribution attribute in each of the first-relation entries is set to a unique value selected from among a domain of unique values. The first-relation entries of the first relation are redistributed among the plurality of processing modules based on the first-relation attribute and the distribution attribute. The computational operation is performed to produce a result.
8 Citations
18 Claims
-
1. A computer program, stored in a non-transitory computer-readable medium, on which is recorded a computer program, the computer program comprising executable instructions, that, when executed, perform a method for performing a database operation in a file system residing on a plurality of processing modules, the file system including a first relation having a plurality of first-relation entries, each of the plurality of first-relation entries having a first-relation attribute that is of interest in the database operation, the file system including a second relation having a plurality of second-relation entries, each of the plurality of second-relation entries having a second-relation attribute that is of interest in the computational operation, the method comprising:
-
setting a value of a distribution attribute in each of the first-relation entries to a unique value selected from among a domain of unique values; redistributing the first-relation entries of the first relation among the plurality of processing modules based on the first-relation attribute and the distribution attribute; making n copies of the second relation, where n is the number of unique values in the domain of unique values; redistributing each of the copies of the second relation to a respective processing module to which the first-relation entries of the first relation have been redistributed by; setting a number attribute for the second-relation entries for each respective copy of the second relation to a respective unique value selected from the domain of unique values, and redistributing each of the copies of the second relation based on the number attribute; and performing the computational operation to produce a result. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method for performing a database operation in a file system residing on a plurality of processing modules, the file system including a first relation having a plurality of first-relation entries, each of the plurality of first-relation entries having a first-relation attribute that is of interest in the database operation, the file system including a second relation having a plurality of second-relation entries, each of the plurality of second-relation entries having a second relation attribute that is of interest in the computational operation, the method comprising:
-
setting a value of a distribution attribute in each of the first-relation entries to a unique value selected from among a domain of unique values; redistributing the first-relation entries of the first relation among the plurality of processing modules based on the first-relation attribute and the distribution attribute; making n copies of the second relation, where n is the number of unique values in the domain of unique values; redistributing each of the copies of the second relation to a respective processing module to which the first-relation entries of the first relation have been redistributed by; setting a number attribute for the second-relation entries for each respective copy of the second relation to a respective unique value selected from the domain of unique values, and redistributing each of the copies of the second relation based on the number attribute; and performing the computational operation to produce a result. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
Specification