Data arrangement management in a distributed data cluster environment of a shared pool of configurable computing resources
First Claim
1. A computer-implemented method for data arrangement management in a distributed data cluster environment of a shared pool of configurable computing resources, the method comprising:
- monitoring, in the distributed data cluster environment, a set of data for a data redistribution candidate trigger;
detecting, in the distributed data cluster environment, the data redistribution candidate trigger with respect to the set of data, wherein detecting the data redistribution candidate trigger comprises;
detecting a data structure which indicates a workload pattern;
building a new distribution key for the data structure to change the workload pattern to reduce data movement during a query operation;
determining, based on the new distribution key, a new data arrangement associated with the set of data, and comparing the new data arrangement with a current data arrangement to determine which data arrangement is more efficient based on resource usage; and
in response to determining that the new data arrangement is more efficient than the current data arrangement, establishing, based on the new distribution key, the new data arrangement in the distributed data cluster environment such that at least a portion of the set of data comprises a different physical location in the new data arrangement.
1 Assignment
0 Petitions
Accused Products
Abstract
Disclosed aspects relate to data arrangement management in a distributed data cluster environment of a shared pool of configurable computing resources. In the distributed data cluster environment, a set of data is monitored for a data redistribution candidate trigger. The data redistribution candidate trigger is detected with respect to the set of data. Based on the data redistribution candidate trigger, the set of data is analyzed with respect to a candidate data redistribution action. Using the candidate data redistribution action, a new data arrangement associated with the set of data is determined. Accordingly, the new data arrangement is established.
27 Citations
20 Claims
-
1. A computer-implemented method for data arrangement management in a distributed data cluster environment of a shared pool of configurable computing resources, the method comprising:
-
monitoring, in the distributed data cluster environment, a set of data for a data redistribution candidate trigger; detecting, in the distributed data cluster environment, the data redistribution candidate trigger with respect to the set of data, wherein detecting the data redistribution candidate trigger comprises; detecting a data structure which indicates a workload pattern; building a new distribution key for the data structure to change the workload pattern to reduce data movement during a query operation; determining, based on the new distribution key, a new data arrangement associated with the set of data, and comparing the new data arrangement with a current data arrangement to determine which data arrangement is more efficient based on resource usage; and in response to determining that the new data arrangement is more efficient than the current data arrangement, establishing, based on the new distribution key, the new data arrangement in the distributed data cluster environment such that at least a portion of the set of data comprises a different physical location in the new data arrangement. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer-implemented method for data arrangement management in a distributed data cluster environment of a shared pool of configurable computing resources, the method comprising:
-
detecting, in the distributed data cluster environment, a data redistribution candidate trigger with respect to a set of data, wherein detecting the data redistribution candidate trigger comprises; detecting a data skew of a data structure which exceeds a threshold data skew value, the data skew being a disproportionate distribution of the set of data across multiple partitions of the distributed data cluster environment; building a new distribution key for the data structure which exceeds the threshold data skew value to reduce the data skew of the data structure; determining, based on the new distribution key, a new data arrangement associated with the set of data, and comparing the new data arrangement with a current data arrangement to determine which data arrangement is more efficient based on resource usage; and in response to determining that the new data arrangement is more efficient than the current data arrangement, establishing, based on the new distribution key, the new data arrangement in the distributed data cluster environment such that at least a portion of the set of data comprises a different physical location in the new data arrangement. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A computer-implemented method for data arrangement management in a distributed data cluster environment of a shared pool of configurable computing resources, the method comprising:
-
detecting, in the distributed data cluster environment, a data redistribution candidate trigger with respect to a set of data, wherein detecting the data redistribution candidate trigger comprises; detecting a data structure which exceeds a data transmission frequency threshold by detecting data structures using network bandwidth beyond a threshold amount during a particular temporal period; and establishing, in the distributed data cluster environment, a new data arrangement by deploying the data structure which exceeds the data transmission frequency threshold to at least a threshold number of partitions in the distributed data cluster environment to reduce overall computing resources utilization such that at least a portion of the set of data comprises a different physical location in the new data arrangement. - View Dependent Claims (17, 18, 19, 20)
-
Specification