Cache based efficient access scheduling for super scaled stream processing systems
First Claim
1. A method of servicing a plurality of client bins making multi-dimensional queries against a streaming data store, the method including:
- receiving a multitude of queries from a plurality of client bins, the queries including a topic-partition pair and start offsets that reference events in an unbounded event stream, the start offsets mixing historical offsets and current offsets;
establishing separate caches for active, unique topic-partition pairs;
periodically determining query increments that request data not already present in the separate caches and grouping the query increments across topic-partition pairs into a query group;
submitting the query group to a streaming data store that provides access to one or more unbounded data streams;
receiving stream data responsive to the query group, including larger data blocks responsive to historical offsets and smaller data blocks responsive to current offsets, and updating the separate caches for the topic-partition pairs; and
simultaneously servicing the multitude of queries from the separate caches.
1 Assignment
0 Petitions
Accused Products
Abstract
The technology disclosed relates to discovering a previously unknown attribute of stream processing systems according to which client offsets or client subscription queries for a streaming data store rapidly converge to a dynamic tip of a data stream that includes the most recent messages or events. In particular, it relates to grouping clients into bins to reduce a number of queries to the streaming data store by several orders of magnitude when servicing tens, hundreds, thousands or millions of clients. The bin count is further reduced by coalescing bins that have overlapping offsets. It also relates to establishing separate caches only for the current tips of data streams and serving the bins from the caches instead of the backend data store using group queries. Further, the caches are periodically updated to include the most recent messages or events appended to the dynamic tips of the data streams.
-
Citations
20 Claims
-
1. A method of servicing a plurality of client bins making multi-dimensional queries against a streaming data store, the method including:
-
receiving a multitude of queries from a plurality of client bins, the queries including a topic-partition pair and start offsets that reference events in an unbounded event stream, the start offsets mixing historical offsets and current offsets; establishing separate caches for active, unique topic-partition pairs; periodically determining query increments that request data not already present in the separate caches and grouping the query increments across topic-partition pairs into a query group; submitting the query group to a streaming data store that provides access to one or more unbounded data streams; receiving stream data responsive to the query group, including larger data blocks responsive to historical offsets and smaller data blocks responsive to current offsets, and updating the separate caches for the topic-partition pairs; and simultaneously servicing the multitude of queries from the separate caches. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A non-transitory computer readable storage medium impressed with computer program instructions to service a plurality of clients from an unbounded data stream, while allowing each client to select any available service offset in the data stream as a starting offset for streaming service to the client, the instructions, when executed on a processor, implement a method comprising:
-
receiving a multitude of queries from a plurality of client bins, the queries including a topic-partition pair and start offsets that reference events in an unbounded event stream, the start offsets mixing historical offsets and current offsets; establishing separate caches for active, unique topic-partition pairs; periodically determining query increments that request data not already present in the separate caches and grouping the query increments across topic-partition pairs into a query group; submitting the query group to a streaming data store that provides access to one or more unbounded data streams; receiving stream data responsive to the query group, including larger data blocks responsive to historical offsets and smaller data blocks responsive to current offsets, and updating the separate caches for the topic-partition pairs; and simultaneously servicing the multitude of queries from the separate caches. - View Dependent Claims (18, 19)
-
-
20. A system including one or more processors coupled to memory, the memory loaded with computer instructions to service a plurality of clients from an unbounded data stream, while allowing each client to select any available service offset in the data stream as a starting offset for streaming service to the client, the instructions, when executed on the processors, implement actions comprising:
-
receiving a multitude of queries from a plurality of client bins, the queries including a topic-partition pair and start offsets that reference events in an unbounded event stream, the start offsets mixing historical offsets and current offsets; establishing separate caches for active, unique topic-partition pairs; periodically determining query increments that request data not already present in the separate caches and grouping the query increments across topic-partition pairs into a query group; submitting the query group to a streaming data store that provides access to one or more unbounded data streams; receiving stream data responsive to the query group, including larger data blocks responsive to historical offsets and smaller data blocks responsive to current offsets, and updating the separate caches for the topic-partition pairs; and simultaneously servicing the multitude of queries from the separate caches.
-
Specification