Scalable distributed in-memory computation
First Claim
1. A method comprising:
- distributing in-memory computations across at least first and second nodes of respective distinct data processing clusters of a plurality of data processing clusters over at least one network; and
aggregating results of the distributed in-memory computations for delivery to a requesting client device, wherein the results of the distributed in-memory computations are generated in respective ones of the at least first and second nodes in a decentralized and privacy-preserving manner;
wherein the data processing clusters are associated with respective distinct data zones, the first and second nodes of the respective distinct data processing clusters being configured to perform corresponding portions of the distributed in-memory computations utilizing respective ones of first and second in-memory datasets locally accessible within their respective data zones;
wherein the aggregating comprises processing local results received from respective ones of the at least first and second nodes of the data processing clusters to generate a global result as a function of the local results; and
wherein the method is performed by at least one processing device comprising a processor coupled to a memory.
7 Assignments
0 Petitions
Accused Products
Abstract
An apparatus in one embodiment comprises at least one processing device having a processor coupled to a memory. The processing device is configured to distribute in-memory computations across at least first and second nodes of respective distinct data processing clusters of a plurality of data processing clusters over at least one network, and to aggregate results of the distributed in-memory computations for delivery to a requesting client device. The data processing clusters are associated with respective distinct data zones, and the first and second nodes of the respective distinct data processing clusters are configured to perform corresponding portions of the distributed in-memory computations utilizing respective ones of first and second in-memory datasets locally accessible within their respective data zones. The in-memory computations in some embodiments illustratively comprise Spark computations, such as Spark Core batch computations. The in-memory datasets in such an arrangement may comprise respective Spark resilient distributed datasets.
-
Citations
22 Claims
-
1. A method comprising:
-
distributing in-memory computations across at least first and second nodes of respective distinct data processing clusters of a plurality of data processing clusters over at least one network; and aggregating results of the distributed in-memory computations for delivery to a requesting client device, wherein the results of the distributed in-memory computations are generated in respective ones of the at least first and second nodes in a decentralized and privacy-preserving manner; wherein the data processing clusters are associated with respective distinct data zones, the first and second nodes of the respective distinct data processing clusters being configured to perform corresponding portions of the distributed in-memory computations utilizing respective ones of first and second in-memory datasets locally accessible within their respective data zones; wherein the aggregating comprises processing local results received from respective ones of the at least first and second nodes of the data processing clusters to generate a global result as a function of the local results; and wherein the method is performed by at least one processing device comprising a processor coupled to a memory. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer program product comprising a non-transitory processor-readable storage medium having stored therein program code of one or more software programs, wherein the program code when executed by at least one processing device causes said at least one processing device:
-
to distribute in-memory computations across at least first and second nodes of respective distinct data processing clusters of a plurality of data processing clusters over at least one network; and to aggregate results of the distributed in-memory computations for delivery to a requesting client device, wherein the results of the distributed in-memory computations are generated in respective ones of the at least first and second nodes in a decentralized and privacy-preserving manner; wherein the data processing clusters are associated with respective distinct data zones, the first and second nodes of the respective distinct data processing clusters being configured to perform corresponding portions of the distributed in-memory computations utilizing respective ones of first and second in-memory datasets locally accessible within their respective data zones; and wherein the aggregating comprises processing local results received from respective ones of the at least first and second nodes of the data processing clusters to generate a global result as a function of the local results. - View Dependent Claims (18, 19)
-
-
20. An apparatus comprising:
-
at least one processing device having a processor coupled to a memory; wherein said at least one processing device is configured; to distribute in-memory computations across at least first and second nodes of respective distinct data processing clusters of a plurality of data processing clusters over at least one network; and to aggregate results of the distributed in-memory computations for delivery to a requesting client device, wherein the results of the distributed in-memory computations are generated in respective ones of the at least first and second nodes in a decentralized and privacy-preserving manner; wherein the data processing clusters are associated with respective distinct data zones, the first and second nodes of the respective distinct data processing clusters being configured to perform corresponding portions of the distributed in-memory computations utilizing respective ones of first and second in-memory datasets locally accessible within their respective data zones; and wherein the aggregating comprises processing local results received from respective ones of the at least first and second nodes of the data processing clusters to generate a global result as a function of the local results. - View Dependent Claims (21, 22)
-
Specification