EFFICIENT ACCESS SCHEDULING FOR SUPER SCALED STREAM PROCESSING SYSTEMS
First Claim
1. A method of reducing a number of queries to a message data store by several orders of magnitude when servicing a plurality of clients, with each client in the plurality of clients requesting subscription to the data store at any available message offset in the data store as a starting offset for streaming messages, the method including:
- grouping client subscription requests into one or more bins based on the clients'"'"' respective subscription offsets identified in the requests and the bins'"'"' respective current bin offset ranges; and
reducing queries to the data store when streaming messages to the plurality of clients starting from their respective subscription offsets by issuing against the data store a single block query for each of the bins instead of issuing individual queries for each of the client subscription requests.
1 Assignment
0 Petitions
Accused Products
Abstract
The technology disclosed relates to discovering a previously unknown attribute of stream processing systems according to which client offsets or client subscription queries for a streaming data store rapidly converge to a dynamic tip of a data stream that includes the most recent messages or events. In particular, it relates to grouping clients into bins to reduce a number of queries to the streaming data store by several orders of magnitude when servicing tens, hundreds, thousands or millions of clients. The bin count is further reduced by coalescing bins that have overlapping offsets. It also relates to establishing separate caches only for the current tips of data streams and serving the bins from the caches instead of the backend data store using group queries. Further, the caches are periodically updated to include the most recent messages or events appended to the dynamic tips of the data streams.
23 Citations
20 Claims
-
1. A method of reducing a number of queries to a message data store by several orders of magnitude when servicing a plurality of clients, with each client in the plurality of clients requesting subscription to the data store at any available message offset in the data store as a starting offset for streaming messages, the method including:
-
grouping client subscription requests into one or more bins based on the clients'"'"' respective subscription offsets identified in the requests and the bins'"'"' respective current bin offset ranges; and reducing queries to the data store when streaming messages to the plurality of clients starting from their respective subscription offsets by issuing against the data store a single block query for each of the bins instead of issuing individual queries for each of the client subscription requests. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A non-transitory computer readable storage medium impressed with computer program instructions to service a plurality of clients from an unbounded data stream, while allowing each client to select any available service offset in the data stream as a starting offset for streaming service to the client, the instructions, when executed on a processor, implement a method comprising:
-
binning the clients in bins based on the clients'"'"' respective selected starting offsets in the data stream and current offsets of the bins; and serving the bins from block queries against the data stream.
-
-
20. A system including one or more processors coupled to memory, the memory loaded with computer instructions to service a plurality of clients from an unbounded data stream, while allowing each client to select any available service offset in the data stream as a starting offset for streaming service to the client, the instructions, when executed on the processors, implement actions comprising:
-
binning the clients in bins based on the clients'"'"' respective selected starting offsets in the data stream and current offsets of the bins; and serving the bins from block queries against the data stream.
-
Specification