Memory optimized data shuffle
First Claim
1. A method comprising:
- maintaining a first data shuffle memory pool at a data shuffle writer node and a second data shuffle memory pool at a data shuffle reader node, wherein the data shuffle writer node and the data shuffle reader node are part of a set of multiple nodes of a distributed data processing system;
performing an in-memory compression on at least a portion of a data set from the first data shuffle memory pool;
performing a data shuffle operation on the at least a portion of the compressed data from the first shuffle memory pool, wherein the data shuffle operation maps different parts of the compressed data for transmission to different nodes of the distributed data processing system; and
transmitting, in response to the data shuffle operation, the at least a portion of the compressed data from the first data shuffle memory pool to the second data shuffle memory pool in a peer-to-peer manner;
wherein the distributed data processing system is implemented via one or more processing devices.
7 Assignments
0 Petitions
Accused Products
Abstract
In a distributed data processing system with a set of multiple nodes, a first data shuffle memory pool is maintained at a data shuffle writer node, and a second data shuffle memory pool is maintained at a data shuffle reader node. The data shuffle writer node and the data shuffle reader node are part of the set of multiple nodes of the distributed data processing system. In-memory compression is performed on at least a portion of a data set from the first data shuffle memory pool. At least a portion of the compressed data is transmitted from the first data shuffle memory pool to the second data shuffle memory pool in a peer-to-peer manner. Each of the first data shuffle memory pool and the second data shuffle memory pool may include a hybrid memory configuration.
6 Citations
20 Claims
-
1. A method comprising:
-
maintaining a first data shuffle memory pool at a data shuffle writer node and a second data shuffle memory pool at a data shuffle reader node, wherein the data shuffle writer node and the data shuffle reader node are part of a set of multiple nodes of a distributed data processing system; performing an in-memory compression on at least a portion of a data set from the first data shuffle memory pool; performing a data shuffle operation on the at least a portion of the compressed data from the first shuffle memory pool, wherein the data shuffle operation maps different parts of the compressed data for transmission to different nodes of the distributed data processing system; and transmitting, in response to the data shuffle operation, the at least a portion of the compressed data from the first data shuffle memory pool to the second data shuffle memory pool in a peer-to-peer manner; wherein the distributed data processing system is implemented via one or more processing devices. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A distributed data processing system, comprising:
-
a plurality of nodes wherein at least one node is configured as a data shuffle writer node and at least another node is configured as a data shuffle reader node, wherein; the data shuffle writer node is configured to maintain a first data shuffle memory pool; the data shuffle reader node is configured to maintain a second data shuffle memory pool; and
further wherein;the data shuffle writer node is configured to perform an in-memory compression on at least a portion of a data set from the first data shuffle memory pool; the data shuffle writer node is configured to perform a data shuffle operation on at least a portion of the compressed data from the first shuffle memory pool, wherein the data shuffle operation maps different parts of the compressed data for transmission to different nodes of the distributed data processing system; and the data shuffle writer node, in response to the data shuffle operation, is configured to transmit the at least a portion of the compressed data from the first data shuffle memory pool to the second data shuffle memory pool in a peer-to-peer manner; wherein the distributed data processing system is implemented via one or more processing devices. - View Dependent Claims (16, 17, 18, 19)
-
-
20. A computer program product comprising a non-transitory processor-readable storage medium having stored therein program code of one or more software programs, wherein the program code when executed by one or more processing devices of a distributed data processing system causes said one or more processing devices to:
-
maintain a first data shuffle memory pool at a data shuffle writer node and a second data shuffle memory pool at a data shuffle reader node, wherein the data shuffle writer node and the data shuffle reader node are part of a set of multiple nodes of the distributed data processing system; perform an in-memory compression on at least a portion of a data set from the first data shuffle memory pool; and perform a data shuffle operation on the at least a portion of the compressed data from the first shuffle memory pool, wherein the data shuffle operation maps different parts of the compressed data for transmission to different nodes of the distributed data processing system; and transmit, in response to the data shuffle operation, the at least a portion of the compressed data from the first data shuffle memory pool to the second data shuffle memory pool in a peer-to-peer manner.
-
Specification