Scalable distributed in-memory computation utilizing batch mode extensions
First Claim
1. A method comprising:
- distributing in-memory computations of a batch computation framework across a plurality of data processing clusters associated with respective data zones; and
combining local processing results of the distributed in-memory computations from respective ones of the data processing clusters;
wherein the distributed in-memory computations utilize local data structures of respective ones of the data processing clusters;
wherein a given one of the local data structures in one of the data processing clusters receives local data of the corresponding data zone and is utilized to generate the local processing results of that data processing cluster that are combined with local processing results of other ones of the data processing clusters;
wherein the local data structures are configured to support one or more batch mode extensions of the batch computation framework for performance of the distributed in-memory computations;
wherein the local data structures comprise respective portions of a global data structure characterizing the distributed in-memory computations of the batch computation framework;
wherein the global data structure comprises at least one of a global table, a global dataset and a global property graph associated with respective local data structures comprising local tables, local datasets and local property graphs; and
wherein the method is performed by at least one processing device comprising a processor coupled to a memory.
3 Assignments
0 Petitions
Accused Products
Abstract
An apparatus in one embodiment comprises at least one processing device having a processor coupled to a memory. The processing device is configured to distribute in-memory computations across a plurality of data processing clusters associated with respective data zones, and to combine local processing results of the distributed in-memory computations from the data processing clusters. The distributed in-memory computations utilize local data structures of respective ones of the data processing clusters. A given one of the local data structures in one of the data processing clusters receives local data of the corresponding data zone and is utilized to generate the local processing results of that data processing cluster that are combined with local processing results of other ones of the data processing clusters. The local data structures are configured to support batch mode extensions such as Spark SQL, Spark MLlib or Spark GraphX for performance of the distributed in-memory computations.
-
Citations
20 Claims
-
1. A method comprising:
-
distributing in-memory computations of a batch computation framework across a plurality of data processing clusters associated with respective data zones; and combining local processing results of the distributed in-memory computations from respective ones of the data processing clusters; wherein the distributed in-memory computations utilize local data structures of respective ones of the data processing clusters; wherein a given one of the local data structures in one of the data processing clusters receives local data of the corresponding data zone and is utilized to generate the local processing results of that data processing cluster that are combined with local processing results of other ones of the data processing clusters; wherein the local data structures are configured to support one or more batch mode extensions of the batch computation framework for performance of the distributed in-memory computations; wherein the local data structures comprise respective portions of a global data structure characterizing the distributed in-memory computations of the batch computation framework; wherein the global data structure comprises at least one of a global table, a global dataset and a global property graph associated with respective local data structures comprising local tables, local datasets and local property graphs; and wherein the method is performed by at least one processing device comprising a processor coupled to a memory. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A computer program product comprising a non-transitory processor-readable storage medium having stored therein program code of one or more software programs, wherein the program code when executed by at least one processing device causes said at least one processing device:
-
to distribute in-memory computations of a batch computation framework across a plurality of data processing clusters associated with respective data zones; and to combine local processing results of the distributed in-memory computations from respective ones of the data processing clusters; wherein the distributed in-memory computations utilize local data structures of respective ones of the data processing clusters; wherein a given one of the local data structures in one of the data processing clusters receives local data of the corresponding data zone and is utilized to generate the local processing results of that data processing cluster that are combined with local processing results of other ones of the data processing clusters; wherein the local data structures are configured to support one or more batch mode extensions of the batch computation framework for performance of the distributed in-memory computations; wherein the local data structures comprise respective portions of a global data structure characterizing the distributed in-memory computations of the batch computation framework; and wherein the global data structure comprises at least one of a global table, a global dataset and a global property graph associated with respective local data structures comprising local tables, local datasets and local property graphs. - View Dependent Claims (15, 16)
-
-
17. An apparatus comprising:
-
at least one processing device having a processor coupled to a memory; wherein said at least one processing device is configured; to distribute in-memory computations of a batch computation framework across a plurality of data processing clusters associated with respective data zones; and to combine local processing results of the distributed in-memory computations from respective ones of the data processing clusters; wherein the distributed in-memory computations utilize local data structures of respective ones of the data processing clusters; wherein a given one of the local data structures in one of the data processing clusters receives local data of the corresponding data zone and is utilized to generate the local processing results of that data processing cluster that are combined with local processing results of other ones of the data processing clusters; wherein the local data structures are configured to support one or more batch mode extensions of the batch computation framework for performance of the distributed in-memory computations; wherein the local data structures comprise respective portions of a global data structure characterizing the distributed in-memory computations of the batch computation framework; and wherein the global data structure comprises at least one of a global table, a global dataset and a global property graph associated with respective local data structures comprising local tables, local datasets and local property graphs. - View Dependent Claims (18, 19, 20)
-
Specification