Scalable distributed computations utilizing multiple distinct clouds
First Claim
1. A method comprising:
- initiating distributed computations across a plurality of data processing clusters associated with respective data zones; and
combining local processing results of the distributed computations from respective ones of the data processing clusters;
the data processing clusters being configured to perform respective portions of the distributed computations by processing data local to their respective data zones utilizing at least one local data structure configured to support at least one computational framework;
a first one of data processing clusters being implemented in a first cloud of a first type provided by a first cloud service provider;
at least a second one of the data processing clusters being implemented in a second cloud of a second type different than the first type, provided by a second cloud service provider different than the first cloud service provider;
wherein the plurality of data processing clusters associated with the respective data zones are organized in accordance with a global computation graph for performance of the distributed computations and wherein the global computation graph comprises a plurality of nodes corresponding to respective ones of the data processing clusters and further wherein the plurality of nodes are arranged in multiple levels each including at least one of the nodes;
wherein a global data structure is organized in levels with different levels of the global data structure corresponding to respective ones of the levels of the global computation graph and wherein a given one of the levels of the global data structure comprises local processing results generated by nodes of the corresponding level in the global computation graph;
wherein the local processing results of the distributed computations from respective ones of the data processing clusters are combined utilizing the global data structure configured based at least in part on the at least one local data structure in order to produce global processing results of the distributed computations; and
wherein the method is performed by at least one processing device comprising a processor coupled to a memory.
7 Assignments
0 Petitions
Accused Products
Abstract
An apparatus in one embodiment comprises at least one processing device having a processor coupled to a memory. The processing device is configured to initiate distributed computations across a plurality of data processing clusters associated with respective data zones, and to combine local processing results of the distributed computations from respective ones of the data processing clusters. The data processing clusters are configured to perform respective portions of the distributed computations by processing data local to their respective data zones utilizing at least one local data structure configured to support at least one computational framework. A first one of data processing clusters is implemented in a first cloud of a first type provided by a first cloud service provider. At least a second one of the data processing clusters is implemented in a second cloud of a second type different than the first type, provided by a second cloud service provider.
136 Citations
20 Claims
-
1. A method comprising:
-
initiating distributed computations across a plurality of data processing clusters associated with respective data zones; and combining local processing results of the distributed computations from respective ones of the data processing clusters; the data processing clusters being configured to perform respective portions of the distributed computations by processing data local to their respective data zones utilizing at least one local data structure configured to support at least one computational framework; a first one of data processing clusters being implemented in a first cloud of a first type provided by a first cloud service provider; at least a second one of the data processing clusters being implemented in a second cloud of a second type different than the first type, provided by a second cloud service provider different than the first cloud service provider; wherein the plurality of data processing clusters associated with the respective data zones are organized in accordance with a global computation graph for performance of the distributed computations and wherein the global computation graph comprises a plurality of nodes corresponding to respective ones of the data processing clusters and further wherein the plurality of nodes are arranged in multiple levels each including at least one of the nodes; wherein a global data structure is organized in levels with different levels of the global data structure corresponding to respective ones of the levels of the global computation graph and wherein a given one of the levels of the global data structure comprises local processing results generated by nodes of the corresponding level in the global computation graph; wherein the local processing results of the distributed computations from respective ones of the data processing clusters are combined utilizing the global data structure configured based at least in part on the at least one local data structure in order to produce global processing results of the distributed computations; and wherein the method is performed by at least one processing device comprising a processor coupled to a memory. - View Dependent Claims (2, 3, 4, 5, 6, 14)
-
-
7. A method comprising:
-
initiating distributed computations across a plurality of data processing clusters associated with respective data zones; and combining local processing results of the distributed computations from respective ones of the data processing clusters; the data processing clusters being configured to perform respective portions of the distributed computations by processing data local to their respective data zones utilizing at least one local data structure configured to support at least one computational framework; a first one of data processing clusters being implemented in a first cloud of a first type provided by a first cloud service provider; at least a second one of the data processing clusters being implemented in a second cloud of a second type different than the first type, provided by a second cloud service provider different than the first cloud service provider; wherein the first one of data processing clusters implemented in the first cloud utilizes a first local data structure configured to support a first computational framework and wherein at least the second one of the data processing clusters implemented in the second cloud utilizes a second local data structure different than the first local data structure and configured to support a second computational framework different than the first computational framework; wherein the first computational framework comprises a Spark batch framework and the second computational framework comprises a Spark streaming framework; wherein the local processing results of the distributed computations from respective ones of the data processing clusters are combined utilizing a global data structure configured based at least in part on the at least one local data structure in order to produce global processing results of the distributed computations; and wherein the method is performed by at least one processing device comprising a processor coupled to a memory. - View Dependent Claims (8, 9, 10, 11, 12, 13)
-
-
15. A computer program product comprising a non-transitory processor-readable storage medium having stored therein program code of one or more software programs, wherein the program code when executed by at least one processing device causes said at least one processing device:
-
to initiate distributed computations across a plurality of data processing clusters associated with respective data zones; and to combine local processing results of the distributed computations from respective ones of the data processing clusters; the data processing clusters being configured to perform respective portions of the distributed computations by processing data local to their respective data zones utilizing at least one local data structure configured to support at least one computational framework; a first one of data processing clusters being implemented in a first cloud of a first type provided by a first cloud service provider; at least a second one of the data processing clusters being implemented in a second cloud of a second type different than the first type, provided by a second cloud service provider different than the first cloud service provider; wherein the plurality of data processing clusters associated with the respective data zones are organized in accordance with a global computation graph for performance of the distributed computations and wherein the global computation graph comprises a plurality of nodes corresponding to respective ones of the data processing clusters and further wherein the plurality of nodes are arranged in multiple levels each including at least one of the nodes; wherein a global data structure is organized in levels with different levels of the global data structure corresponding to respective ones of the levels of the global computation graph and wherein a given one of the levels of the global data structure comprises local processing results generated by nodes of the corresponding level in the global computation graph; and wherein the local processing results of the distributed computations from respective ones of the data processing clusters are combined utilizing the global data structure configured based at least in part on the at least one local data structure in order to produce global processing results of the distributed computations. - View Dependent Claims (16, 17)
-
-
18. An apparatus comprising:
-
at least one processing device having a processor coupled to a memory; wherein said at least one processing device is configured; to initiate distributed computations across a plurality of data processing clusters associated with respective data zones; and to combine local processing results of the distributed computations from respective ones of the data processing clusters; the data processing clusters being configured to perform respective portions of the distributed computations by processing data local to their respective data zones utilizing at least one local data structure configured to support at least one computational framework; a first one of data processing clusters being implemented in a first cloud of a first type provided by a first cloud service provider; at least a second one of the data processing clusters being implemented in a second cloud of a second type different than the first type, provided by a second cloud service provider different than the first cloud service provider; wherein the plurality of data processing clusters associated with the respective data zones are organized in accordance with a global computation graph for performance of the distributed computations and wherein the global computation graph comprises a plurality of nodes corresponding to respective ones of the data processing clusters and further wherein the plurality of nodes are arranged in multiple levels each including at least one of the nodes; wherein a global data structure is organized in levels with different levels of the global data structure corresponding to respective ones of the levels of the global computation graph and wherein a given one of the levels of the global data structure comprises local processing results generated by nodes of the corresponding level in the global computation graph; and wherein the local processing results of the distributed computations from respective ones of the data processing clusters are combined utilizing the global data structure configured based at least in part on the at least one local data structure in order to produce global processing results of the distributed computations. - View Dependent Claims (19, 20)
-
Specification