Method and apparatus for distributed data processing
First Claim
1. A method for distributed data processing at a current node, comprising:
- receiving from an upstream node an input message attached with a shared count, the shared count being used for determining a processing status of a root message associated with the input message;
processing the input message to generate one or more new messages;
allocating to each of the one or more new messages a respective new shared count based on the received shared count; and
transmitting the one or more new messages to one or more downstream nodes respectively, wherein the current node is one of a root node, a leaf node, and an intermediate working node, wherein, in response to the current node being the root node;
the receiving an input message comprises receiving the root message from an external source, allocating a predetermined shared count for the root message, and, in response to receiving the root message from the external source, reporting first status information to a tracking task, wherein, in response to the current node being the leaf node;
in response to receiving the input message attached with the shared count, reporting second status information to the tracking task, and wherein, in response to the current node being the intermediate working node;
avoiding reporting status information to the tracking task.
9 Assignments
0 Petitions
Accused Products
Abstract
Embodiments of the present disclosure provide a method and apparatus for distributed data processing. The method comprises: receiving from an upstream node an input message attached with a shared count, the shared count being used for determining a processing status of a root message associated with the input message; processing the input message to generate one or more new messages; allocating to each of the one or more new messages a respective new shared count based on the received shared count; and transmitting the one or more new messages to one or more downstream nodes respectively. Compared with the prior art, the methods and apparatuses for distributed data processing according to the embodiments of the present disclosure can effectively reduce network traffic overheads and the consumed CPU and memory resources, and would be scalable for different topologies of various distributed data processing systems.
11 Citations
14 Claims
-
1. A method for distributed data processing at a current node, comprising:
- receiving from an upstream node an input message attached with a shared count, the shared count being used for determining a processing status of a root message associated with the input message;
processing the input message to generate one or more new messages;
allocating to each of the one or more new messages a respective new shared count based on the received shared count; and
transmitting the one or more new messages to one or more downstream nodes respectively, wherein the current node is one of a root node, a leaf node, and an intermediate working node, wherein, in response to the current node being the root node;
the receiving an input message comprises receiving the root message from an external source, allocating a predetermined shared count for the root message, and, in response to receiving the root message from the external source, reporting first status information to a tracking task, wherein, in response to the current node being the leaf node;
in response to receiving the input message attached with the shared count, reporting second status information to the tracking task, and wherein, in response to the current node being the intermediate working node;
avoiding reporting status information to the tracking task. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- receiving from an upstream node an input message attached with a shared count, the shared count being used for determining a processing status of a root message associated with the input message;
-
9. A method for distributed data processing, comprising:
- in response to receiving, at a root node, a root message from an external source;
allocating a predetermined shared count for the root message; and
reporting first status information associated with the root message to a tracking task;
in response to receiving, at a leaf node, an input message associated with the root message;
reporting second status information associated with the root message to the tracking task;
receiving, at the tracking task, the first status information associated with the root message from the root node, the first status information including an identifier of the root message, a first shared count allocated to the root message, and a first operating code;
receiving, at the tracking task, the second status information associated with the root message from the leaf node, the second status information including the identifier of the root message, a second shared count received by the leaf node, and a second operating code, the first status information and the second status information being received while avoiding receiving at the tracking task status information associated with the root message from an intermediate working node;
processing, based on the first operating code and the second operating code included in the first status information and the second status information, respectively, the first shared count and the second shared count to determine a total shared count corresponding to the root message; and
determining, based on the total shared count, a processing status of the root message. - View Dependent Claims (10, 11, 12, 13)
- in response to receiving, at a root node, a root message from an external source;
-
14. An apparatus for distributed data processing at a current node, comprising:
- a processor configured to execute a plurality of software modules including a message receiving module, a message processing module, a shared count allocating module, a message transmitting module, and a first reporting module, wherein the processor is configured to execute the message receiving module to receive from an upstream node an input message attached with a shared count, the shared count being used for determining a processing status of a root message associated with the input message;
wherein the processor is configured to execute the message processing module to process the input message to generate one or more new messages;
wherein the processor is configured to execute the shared count allocating module to allocate to each of the one or more new messages a respective new shared count based on the received shared count; and
wherein the processor is configured to execute the message transmitting module to transmit the one or more new messages to one or more downstream nodes respectively, wherein the current node is one of a root node, a leaf node, and an intermediate working node, wherein, in response to the current node being the root node;
the processor is further configured to execute the message receiving module to receive the root message from an external source, and allocate a predetermined shared count for the root message; and
the processor is configured to execute the first reporting module, in response to receiving the root message from the external source, to report first status information to a tracking task, wherein, in response to the current node being the leaf node;
the processor is further configured to execute the first reporting module, in response to receiving the input message attached with the shared count, to report second status information to the tracking task, and wherein, in response to the current node being the intermediate working node;
the processor is configured to avoid reporting status information to the tracking task.
- a processor configured to execute a plurality of software modules including a message receiving module, a message processing module, a shared count allocating module, a message transmitting module, and a first reporting module, wherein the processor is configured to execute the message receiving module to receive from an upstream node an input message attached with a shared count, the shared count being used for determining a processing status of a root message associated with the input message;
Specification