Functionality of decomposition data skew in asymmetric massively parallel processing databases
First Claim
1. A method of restructuring a table having data skew in a computer system, the computer system storing data from a database in partitions on one or more nodes of the computer system, the method comprising:
- determining whether original data values of a distribution key column of the table include frequent data values that cause data skew in the table;
after the original data values of the distribution key column have been determined to include the frequent data values, copying only the original data values of the distribution key column that comprise the frequent data values to a switch column added to the table;
after the original data values of the distribution key column that comprise the frequent data values have been copied to the switch column, replacing only the original data values in the distribution key column that comprise the frequent data values with modified data values that reduce the data skew in the table during partitioning, wherein the original data values that are copied and replaced comprise a subset of the original data values and the subset of the original data values comprises one or more of the frequent data values that cause the data skew in the table;
after the original data values in the distribution key column that comprise the frequent data values have been replaced, partitioning the rows of the table across the nodes of the computer system using the distribution key column with the modified data values; and
performing database operations other than the partitioning using the original data values, but not the modified data values.
1 Assignment
0 Petitions
Accused Products
Abstract
Database queries are optimized through the functionality of decomposition data skew in an asymmetric massively parallel processing database system. A table having data skew is restructured by (1) storing original data values of a distribution key in a special switch column added to the table, (2) replacing the original data values of the distribution key with modified data values such as randomly generated data values, and (3) partitioning the rows across the nodes of the asymmetric massively parallel processing database system based on the distribution key. The original data values that are stored and replaced may only comprise a subset of the original data values that cause data skew in the table. Data skew is reduced, which improves performance, yet the original data values remain available, which reduces the impact on collocated joins.
-
Citations
12 Claims
-
1. A method of restructuring a table having data skew in a computer system, the computer system storing data from a database in partitions on one or more nodes of the computer system, the method comprising:
-
determining whether original data values of a distribution key column of the table include frequent data values that cause data skew in the table; after the original data values of the distribution key column have been determined to include the frequent data values, copying only the original data values of the distribution key column that comprise the frequent data values to a switch column added to the table; after the original data values of the distribution key column that comprise the frequent data values have been copied to the switch column, replacing only the original data values in the distribution key column that comprise the frequent data values with modified data values that reduce the data skew in the table during partitioning, wherein the original data values that are copied and replaced comprise a subset of the original data values and the subset of the original data values comprises one or more of the frequent data values that cause the data skew in the table; after the original data values in the distribution key column that comprise the frequent data values have been replaced, partitioning the rows of the table across the nodes of the computer system using the distribution key column with the modified data values; and performing database operations other than the partitioning using the original data values, but not the modified data values. - View Dependent Claims (2, 3, 4)
-
-
5. An apparatus for restructuring a table having the data skew, comprising:
-
a computer system for storing data from a database in partitions on one or more nodes of the computer system; and a process performed by the computer system, the process configured to; determine whether original data values of a distribution key column of the table include frequent data values that cause data skew in the table; after the original data values of the distribution key column have been determined to include the frequent data values, copy only the original data values that comprise the frequent data values of the distribution key column to a switch column added to the table; after the original data values of the distribution key column have been copied to the switch column, replace only the original data values in the distribution key column that comprise the frequent data values with modified data values that reduce the data skew in the table during partition, wherein the original data values that are copied and replaced comprise a subset of the original data values and the subset of the original data values comprises one or more of the frequent data values that cause the data skew in the table; after the original data values in the distribution key column that comprise the frequent data values have been replaced, partition the rows of the table across the nodes of the computer system using the distribution key column with the modified data values; and perform database operations other than the partition using the original data values, but not the modified data values. - View Dependent Claims (6, 7, 8)
-
-
9. An article of manufacture comprising a computer readable storage medium encoded with computer program instructions which, when accessed by a computer system storing data from a database in partitions on one or more nodes of the computer system, cause the computer system to operate as a specially programmed computer system, executing a method for restructuring a table having the data skew, the method comprising:
-
determining whether original data values of a distribution key column of the table include frequent data values that cause data skew in the table; after the original data values of the distribution key column have been determined to include the frequent data values, copying only the original data values of the distribution key column that comprise the frequent data values to a switch column added to the table; after the original data values of the distribution key column have been copied to the switch column, replacing only the original data values in the distribution key column that comprise the frequent data values with modified data values that reduce the data skew in the table during partitioning, wherein the original data values that are copied and replaced comprise a subset of the original data values and the subset of the original data values comprises one or more of the frequent data values that cause the data skew in the table; after the original data values in the distribution key column that comprise the frequent data values have been replaced, partitioning the rows of the table across the nodes of the computer system using the distribution key column with the modified data values; and performing database operations other than the partitioning using the original data values, but not the modified data values. - View Dependent Claims (10, 11, 12)
-
Specification