Efficient access scheduling for super scaled stream processing systems
First Claim
1. A method of reducing a number of queries to a message data store of at least one storage device when servicing a plurality of clients, with each client in the plurality of clients requesting subscription to the data store of the at least one storage device at any available message offset in the data store of the at least one storage device as a starting offset for streaming messages, the method including:
- grouping, by at least one processor communicatively coupled to the at least one storage device, client subscription requests of the plurality of clients into one or more bins based on respective subscription offsets identified in the requests of the plurality of clients and respective current bin offset ranges of the one or more bins; and
reducing, by the at least one processor, queries to the data store of the at least one storage device when streaming messages to the client subscription requests of the plurality of clients starting from their respective subscription offsets by issuing against the data store a single block query for each of the bins instead of issuing individual queries for each of the client subscription requests.
1 Assignment
0 Petitions
Accused Products
Abstract
The technology disclosed relates to discovering a previously unknown attribute of stream processing systems according to which client offsets or client subscription queries for a streaming data store rapidly converge to a dynamic tip of a data stream that includes the most recent messages or events. In particular, it relates to grouping clients into bins to reduce a number of queries to the streaming data store by several orders of magnitude when servicing tens, hundreds, thousands or millions of clients. The bin count is further reduced by coalescing bins that have overlapping offsets. It also relates to establishing separate caches only for the current tips of data streams and serving the bins from the caches instead of the backend data store using group queries. Further, the caches are periodically updated to include the most recent messages or events appended to the dynamic tips of the data streams.
168 Citations
20 Claims
-
1. A method of reducing a number of queries to a message data store of at least one storage device when servicing a plurality of clients, with each client in the plurality of clients requesting subscription to the data store of the at least one storage device at any available message offset in the data store of the at least one storage device as a starting offset for streaming messages, the method including:
-
grouping, by at least one processor communicatively coupled to the at least one storage device, client subscription requests of the plurality of clients into one or more bins based on respective subscription offsets identified in the requests of the plurality of clients and respective current bin offset ranges of the one or more bins; and reducing, by the at least one processor, queries to the data store of the at least one storage device when streaming messages to the client subscription requests of the plurality of clients starting from their respective subscription offsets by issuing against the data store a single block query for each of the bins instead of issuing individual queries for each of the client subscription requests. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A non-transitory computer readable storage medium including computer program instructions that, when executed on a processor, implement a method comprising:
-
receiving selections, at the processor, for a plurality of client subscription requests for available service offsets in an unbounded data stream from a data store as starting offsets for streaming services to the respective client subscription requests; binning, by the processor, the client subscription requests in bins based on the respective selected starting offsets in the data stream and current offsets of the bins for the client subscription requests; receiving, at the data store, at least one block query against the data stream based on the binned clients; and accessing data, by the processor, returned by the at least one block query for clients in a particular bin using a plurality of worker threads.
-
-
20. A system including one or more processors coupled to memory, the memory including computer instructions that, when executed on the one or more processors, implement actions comprising:
-
receiving selections, at the one or more processor, for a plurality of client subscription requests for available service offsets in an unbounded data stream from a data store as starting offsets for streaming services to the respective client subscription requests; binning, at the one or more processors, the client subscription services in bins based on the respective selected starting offsets in the data stream and current offsets of the bins for the client subscription requests; receiving, at the data store, at least one block query against the data stream based on the binned clients; and accessing data, by the processor, returned by the at least one block query for clients in a particular bin using a plurality of worker threads.
-
Specification