×

PARTITION-BASED DATA STREAM PROCESSING FRAMEWORK

  • US 20150134626A1
  • Filed: 11/11/2013
  • Published: 05/14/2015
  • Est. Priority Date: 11/11/2013
  • Status: Active Grant
First Claim
Patent Images

1. A system, comprising:

  • one or more computing devices configured to;

    implement one or more programmatic interfaces enabling a client of a multi-tenant stream processing service to indicate, corresponding to a particular processing stage associated with a specified data stream;

    (a) a processing operation to be performed on data records of the specified data stream in accordance with a partitioning policy and (b) an output distribution descriptor for results of the processing operations;

    receive, from a particular client via the one or more programmatic interfaces, an indication of a particular processing operation to be performed on data records of a particular data stream at the particular processing stage, and a particular output distribution descriptor for results of the particular processing operation;

    determine, based at least in part on the partitioning policy and at least in part on an estimated performance capability of resources to be deployed as worker nodes for the processing stage, an initial number of worker nodes for the specified data stream;

    configure a particular worker node of the initial number of worker nodes to (a) receive data records of one or more partitions of the particular data stream, (b) perform the particular processing operation on received data records, (c) store progress records indicating the portion of the one or more partitions that have been processed at the worker node, and (d) transfer results of the particular processing operations to one or more destinations in accordance with the particular output distribution descriptor;

    monitor a health state of the particular worker node; and

    in response to a determination that the particular worker node is in an undesired state, configure a replacement worker node to replace the particular worker node, wherein the replacement worker node accesses a progress record stored by the particular worker node to identify at least one data record of the one or more partitions on which the particular processing operation is to be performed by the replacement worker node.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×