Data-driven automation mechanism for analytics workload distribution
First Claim
1. A method comprising:
- receiving a data processing request in a first workload distribution node configured to communicate with a plurality of distributed data processing clusters over at least one network;
identifying particular ones of the plurality of distributed data processing clusters that are suitable for handling at least a portion of the data processing request;
separating the data processing request into a plurality of data tasks;
providing each of the data tasks to one or more of the identified distributed data processing clusters;
receiving for each of the data tasks an indication from one or more of the distributed data processing clusters of its ability to perform the data task;
assigning the data tasks to one or more of the distributed data processing clusters responsive to the received indications;
receiving results of performance of the data tasks from the one or more assigned distributed data processing clusters; and
aggregating the results into a response that is returned to a source of the data processing request;
wherein the source of the data processing request comprises another workload distribution node and further wherein the data processing request comprises a given data task of a higher-level data processing request separated into a plurality of data tasks by the other workload distribution node for handling by the first workload distribution node and one or more additional workload distribution nodes;
wherein the first workload distribution node comprises an analytics workload distribution node and the given data task of the higher-level data processing request comprises a request to process at least a portion of an analytics workload using at least a subset of the plurality of distributed data processing clusters;
wherein the data tasks are assigned and the corresponding results are aggregated in a manner that ensures satisfaction of one or more privacy policies of the one or more distributed data processing clusters;
wherein the method is performed by at least one processing device comprising a processor coupled to a memory; and
wherein said at least one processing device implements the first workload distribution node.
7 Assignments
0 Petitions
Accused Products
Abstract
An apparatus in one embodiment comprises at least one processing device having a processor coupled to a memory. The processing device implements a first workload distribution node configured to communicate with multiple distributed data processing clusters over at least one network. The workload distribution node is further configured to receive a data processing request, to identify particular ones of the distributed data processing clusters that are suitable for handling at least a portion of the data processing request, and to assign the data tasks to one or more of the distributed data processing clusters. Results of performance of the data tasks from the one or more assigned distributed data processing clusters are received by the first workload distribution node and aggregated into a response that is returned to a source of the data processing request. The source of the data processing request in some embodiments is another workload distribution node.
166 Citations
20 Claims
-
1. A method comprising:
-
receiving a data processing request in a first workload distribution node configured to communicate with a plurality of distributed data processing clusters over at least one network; identifying particular ones of the plurality of distributed data processing clusters that are suitable for handling at least a portion of the data processing request; separating the data processing request into a plurality of data tasks; providing each of the data tasks to one or more of the identified distributed data processing clusters; receiving for each of the data tasks an indication from one or more of the distributed data processing clusters of its ability to perform the data task; assigning the data tasks to one or more of the distributed data processing clusters responsive to the received indications; receiving results of performance of the data tasks from the one or more assigned distributed data processing clusters; and aggregating the results into a response that is returned to a source of the data processing request; wherein the source of the data processing request comprises another workload distribution node and further wherein the data processing request comprises a given data task of a higher-level data processing request separated into a plurality of data tasks by the other workload distribution node for handling by the first workload distribution node and one or more additional workload distribution nodes; wherein the first workload distribution node comprises an analytics workload distribution node and the given data task of the higher-level data processing request comprises a request to process at least a portion of an analytics workload using at least a subset of the plurality of distributed data processing clusters; wherein the data tasks are assigned and the corresponding results are aggregated in a manner that ensures satisfaction of one or more privacy policies of the one or more distributed data processing clusters; wherein the method is performed by at least one processing device comprising a processor coupled to a memory; and wherein said at least one processing device implements the first workload distribution node. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer program product comprising a non-transitory processor-readable storage medium having stored therein program code of one or more software programs, wherein the program code when executed by at least one processing device causes said at least one processing device to implement a first workload distribution node configured to communicate with a plurality of distributed data processing clusters over at least one network, the first workload distribution node being further configured:
-
to receive a data processing request; to identify particular ones of the plurality of distributed data processing clusters that are suitable for handling at least a portion of the data processing request; to separate the data processing request into a plurality of data tasks; to provide each of the data tasks to one or more of the identified distributed data processing clusters; to receive for each of the data tasks an indication from one or more of the distributed data processing clusters of its ability to perform the data task; to assign the data tasks to one or more of the distributed data processing clusters responsive to the received indications; to receive results of performance of the data tasks from the one or more assigned distributed data processing clusters; and to aggregate the results into a response that is returned to a source of the data processing request; wherein the source of the data processing request comprises another workload distribution node and further wherein the data processing request comprises a given data task of a higher-level data processing request separated into a plurality of data tasks by the other workload distribution node for handling by the first workload distribution node and one or more additional workload distribution nodes; and wherein the first workload distribution node comprises an analytics workload distribution node and the given data task of the higher-level data processing request comprises a request to process at least a portion of an analytics workload using at least a subset of the plurality of distributed data processing clusters; wherein the data tasks are assigned and the corresponding results are aggregated in a manner that ensures satisfaction of one or more privacy policies of the one or more distributed data processing clusters. - View Dependent Claims (13, 14, 15, 16)
-
-
17. An apparatus comprising:
-
at least one processing device having a processor coupled to a memory; wherein said at least one processing device implements a first workload distribution node configured to communicate with a plurality of distributed data processing clusters over at least one network; the workload distribution node being further configured; to receive a data processing request; to identify particular ones of the plurality of distributed data processing clusters that are suitable for handling at least a portion of the data processing request; to separate the data processing request into a plurality of data tasks; to provide each of the data tasks to one or more of the identified distributed data processing clusters; to receive for each of the data tasks an indication from one or more of the distributed data processing clusters of its ability to perform the data task; to assign the data tasks to one or more of the distributed data processing clusters responsive to the received indications; to receive results of performance of the data tasks from the one or more assigned distributed data processing clusters; and to aggregate the results into a response that is returned to a source of the data processing request; wherein the source of the data processing request comprises another workload distribution node and further wherein the data processing request comprises a given data task of a higher-level data processing request separated into a plurality of data tasks by the other workload distribution node for handling by the first workload distribution node and one or more additional workload distribution nodes; wherein the first workload distribution node comprises an analytics workload distribution node and the given data task of the higher-level data processing request comprises a request to process at least a portion of an analytics workload using at least a subset of the plurality of distributed data processing clusters; and wherein the data tasks are assigned and the corresponding results are aggregated in a manner that ensures satisfaction of one or more privacy policies of the one or more distributed data processing clusters. - View Dependent Claims (18, 19, 20)
-
Specification