Data stream ingestion and persistence techniques
First Claim
1. A system, comprising:
- one or more computing devices comprising one or more processors and memory and configured to;
implement a first set of programmatic interfaces enabling a client of a multi-tenant stream management service to select, for a particular data stream, a data ingestion policy from among a plurality of data ingestion policies, wherein the plurality of data ingestion policies includes an at-least-once ingestion policy in accordance with which a record submitter transmits an indication of a data record one or more times to the stream management service until a positive acknowledgement is received;
implement a second set of programmatic interfaces enabling the client to select, for the particular data stream, a data persistence policy from among a plurality of data persistence policies, wherein the plurality of data persistence policies comprises a multiple-replica persistence policy, in accordance with which multiple copies of the data record are to be stored at respective storage locations by the stream management service;
receive, at the stream management service via respective programmatic interfaces of the first and second set, a first indication that the client has selected the at-least-once ingestion policy for the particular data stream and a second indication that the client has selected the multiple-replica persistence policy for the particular data stream;
determine a number of data ingestion nodes or a number of data storage nodes to be configured for the particular data stream based at least in part on a partitioning policy in accordance with which a data ingestion node of the number of data ingestion nodes is selected to ingest data records of a particular partition of the particular data stream or a data storage node of the number of data storage nodes is selected to store data records of the particular partition of the particular data stream; and
in response to a plurality of transmissions indicating a particular data record to the stream management service,send at least one positive acknowledgement corresponding to the plurality of transmissions in accordance with the at-least-once ingestion policy; and
store, in response to a particular transmission of the plurality of transmissions, copies of the particular data record at a plurality of storage locations in accordance with the multiple-replica persistence policy.
1 Assignment
0 Petitions
Accused Products
Abstract
A programmatic interface is implemented, enabling a client of a stream management service to select a data ingestion policy for a data stream. A client request selecting an at-least-once ingestion policy is received. In accordance with the at-least-once policy, a client may transmit an indication of a data record one or more times to the service until a positive acknowledgement is received. In response to receiving a plurality of transmissions indicating a particular data record, respective positive acknowledgements are sent to the client. Based on a persistence policy selected for the stream, copies of the data record are stored at one or more storage locations in response to one particular transmission of the plurality of transmissions.
-
Citations
26 Claims
-
1. A system, comprising:
one or more computing devices comprising one or more processors and memory and configured to; implement a first set of programmatic interfaces enabling a client of a multi-tenant stream management service to select, for a particular data stream, a data ingestion policy from among a plurality of data ingestion policies, wherein the plurality of data ingestion policies includes an at-least-once ingestion policy in accordance with which a record submitter transmits an indication of a data record one or more times to the stream management service until a positive acknowledgement is received; implement a second set of programmatic interfaces enabling the client to select, for the particular data stream, a data persistence policy from among a plurality of data persistence policies, wherein the plurality of data persistence policies comprises a multiple-replica persistence policy, in accordance with which multiple copies of the data record are to be stored at respective storage locations by the stream management service; receive, at the stream management service via respective programmatic interfaces of the first and second set, a first indication that the client has selected the at-least-once ingestion policy for the particular data stream and a second indication that the client has selected the multiple-replica persistence policy for the particular data stream; determine a number of data ingestion nodes or a number of data storage nodes to be configured for the particular data stream based at least in part on a partitioning policy in accordance with which a data ingestion node of the number of data ingestion nodes is selected to ingest data records of a particular partition of the particular data stream or a data storage node of the number of data storage nodes is selected to store data records of the particular partition of the particular data stream; and in response to a plurality of transmissions indicating a particular data record to the stream management service, send at least one positive acknowledgement corresponding to the plurality of transmissions in accordance with the at-least-once ingestion policy; and store, in response to a particular transmission of the plurality of transmissions, copies of the particular data record at a plurality of storage locations in accordance with the multiple-replica persistence policy. - View Dependent Claims (2, 3, 4, 5)
-
6. A method, comprising:
performing, by one or more computing devices; implementing a set of programmatic interfaces enabling a client of a stream management service to select, for a particular data stream, a data ingestion policy from among a plurality of data ingestion policies, wherein the plurality of data ingestion policies includes an at-least-once ingestion policy in accordance with which a record submitter is to transmit an indication of a data record one or more times to the stream management service until a positive acknowledgement is received; receiving a request via a programmatic interface of the set, indicating that the client has selected the at-least-once ingestion policy for the particular data stream; determining a number of data ingestion nodes to be configured for the particular data stream based at least in part on a partitioning policy in accordance with which a data ingestion node of the number of data ingestion nodes is selected to ingest data records of a particular partition of the particular data stream; and in response to receiving a plurality of transmissions indicating a particular data record at the stream management service, sending respective positive acknowledgements corresponding to each transmission of the plurality of transmissions in accordance with the at-least-once ingestion policy; and storing, in response to receiving a particular transmission of the plurality of transmissions, copies of the particular data record at a one or more storage locations in accordance with a data persistence policy selected for the particular data stream. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
22. A non-transitory computer-accessible storage medium storing program instructions that, when executed on one or more processors, cause the one or more processors to:
-
implement a set of programmatic interfaces enabling a client of a network-based data stream management service to select, for a particular data stream, a data persistence policy from among a plurality of data persistence policies, wherein the plurality of data persistence policies includes (a) a multiple-replica persistence policy in accordance with which multiple copies of a data record of the particular data stream are to be stored at respective storage locations and (b) a single-replica persistence policy in which a single copy of a data record of the particular data stream is to be stored, wherein data records for the particular data stream are ingested into the particular data stream based on a selected data ingestion policy; receive, via a programmatic interface of the set, a request indicating that the client has selected the multiple-replica persistence policy for the particular data stream; determine a number of data storage nodes to be configured for the particular data stream based at least in part on a partitioning policy in accordance with which a data storage node of the number of data storage nodes is selected to store data records of a particular partition of the particular data stream; and configure a plurality of storage nodes to implement the multiple-replica persistence policy for data records of the particular data stream. - View Dependent Claims (23, 24, 25, 26)
-
Specification