PARTITION-BASED DATA STREAM PROCESSING FRAMEWORK
First Claim
Patent Images
1. A system, comprising:
- one or more computing devices configured to;
implement one or more programmatic interfaces enabling a client of a multi-tenant stream processing service to indicate, corresponding to a particular processing stage associated with a specified data stream;
(a) a processing operation to be performed on data records of the specified data stream in accordance with a partitioning policy and (b) an output distribution descriptor for results of the processing operations;
receive, from a particular client via the one or more programmatic interfaces, an indication of a particular processing operation to be performed on data records of a particular data stream at the particular processing stage, and a particular output distribution descriptor for results of the particular processing operation;
determine, based at least in part on the partitioning policy and at least in part on an estimated performance capability of resources to be deployed as worker nodes for the processing stage, an initial number of worker nodes for the specified data stream;
configure a particular worker node of the initial number of worker nodes to (a) receive data records of one or more partitions of the particular data stream, (b) perform the particular processing operation on received data records, (c) store progress records indicating the portion of the one or more partitions that have been processed at the worker node, and (d) transfer results of the particular processing operations to one or more destinations in accordance with the particular output distribution descriptor;
monitor a health state of the particular worker node; and
in response to a determination that the particular worker node is in an undesired state, configure a replacement worker node to replace the particular worker node, wherein the replacement worker node accesses a progress record stored by the particular worker node to identify at least one data record of the one or more partitions on which the particular processing operation is to be performed by the replacement worker node.
1 Assignment
0 Petitions
Accused Products
Abstract
A control node of a multi-tenant stream processing service receives a request indicating an operation to be performed on data records of a particular data stream. Based on a stream partitioning policy, the control node determines an initial number of worker nodes to be used. The control node configures a worker node to perform the operation on received records. In response to a determination that the worker node is in an unhealthy state, the control node configures a replacement worker node.
199 Citations
25 Claims
-
1. A system, comprising:
one or more computing devices configured to; implement one or more programmatic interfaces enabling a client of a multi-tenant stream processing service to indicate, corresponding to a particular processing stage associated with a specified data stream;
(a) a processing operation to be performed on data records of the specified data stream in accordance with a partitioning policy and (b) an output distribution descriptor for results of the processing operations;receive, from a particular client via the one or more programmatic interfaces, an indication of a particular processing operation to be performed on data records of a particular data stream at the particular processing stage, and a particular output distribution descriptor for results of the particular processing operation; determine, based at least in part on the partitioning policy and at least in part on an estimated performance capability of resources to be deployed as worker nodes for the processing stage, an initial number of worker nodes for the specified data stream; configure a particular worker node of the initial number of worker nodes to (a) receive data records of one or more partitions of the particular data stream, (b) perform the particular processing operation on received data records, (c) store progress records indicating the portion of the one or more partitions that have been processed at the worker node, and (d) transfer results of the particular processing operations to one or more destinations in accordance with the particular output distribution descriptor; monitor a health state of the particular worker node; and in response to a determination that the particular worker node is in an undesired state, configure a replacement worker node to replace the particular worker node, wherein the replacement worker node accesses a progress record stored by the particular worker node to identify at least one data record of the one or more partitions on which the particular processing operation is to be performed by the replacement worker node. - View Dependent Claims (2, 3, 4, 5)
-
6. A method, comprising:
performing, by one or more computing devices; receiving, at a multi-tenant stream processing service from a particular client, an indication of a particular operation to be performed on data records of a particular data stream at a specified processing stage, and a particular output distribution descriptor for results of the particular operation; determining, based at least in part on the particular operation, an initial number of worker nodes to be configured for the specified processing stage; configuring a particular worker node of the initial number of worker nodes to (a) perform the particular operation on received data records of one or more partitions of the particular data stream, (b) store progress records indicating the portion of the one or more partitions that have been processed at the worker node, and (c) transfer results of the particular operations to one or more destinations in accordance with the particular output distribution descriptor; and in response to a determination that the particular worker node is in an unhealthy state, selecting a replacement worker node to replace the particular worker node, wherein the replacement worker node accesses a progress record stored by the particular worker node to identify at least one data record of the one or more partitions on which the particular operation is to be performed by the replacement worker node. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
21. A non-transitory computer-accessible storage medium storing program instructions that when executed on one or more processors implement a control node of a multi-tenant stream processing service, wherein the control node is operable to:
-
receive, from a particular client via a programmatic interface, an indication of a particular operation to be performed on data records of a particular data stream; determine, based at least in part on a partitioning policy associated with the particular data stream, an initial number of worker nodes for the specified data stream at a processing stage; configure a particular worker node of the initial number of worker nodes to perform the particular operation on received data records of one or more partitions of the particular data stream; and in response to a determination that the particular worker node is in an unhealthy state, configure a replacement worker node to replace the particular worker node. - View Dependent Claims (22, 23, 24, 25)
-
Specification