Providing strong ordering in multi-stage streaming processing
First Claim
1. A method, comprising:
- maintaining, by a grid scheduler, current batch stage information comprising current batch units and downstream batch units that depend on completion of the current batch units;
determining, by the grid scheduler, a batch unit pending dispatch from the downstream batch units for a current batch stage identified in the current batch stage information;
identifying, by the grid scheduler, one or more physical threads in a computing grid that processed batch units for the current batch stage on which the batch unit pending dispatch depends and have registered pending tasks for the current batch stage; and
causing, by the grid scheduler, a dispatch of the batch unit pending dispatch to the one or more identified physical threads subsequent to complete processing of the batch units for the current batch stage,wherein at least one of the maintaining, determining, identifying, and causing are performed by one or more computers.
1 Assignment
0 Petitions
Accused Products
Abstract
The technology disclosed relates to providing strong ordering in multi-stage processing of near real-time (NRT) data streams. In particular, it relates to maintaining current batch-stage information for a batch at a grid-scheduler in communication with a grid-coordinator that controls dispatch of batch-units to the physical threads for a batch-stage. This includes operating a computing grid, and queuing data from the NRT data streams as batches in pipelines for processing over multiple stages in the computing grid. Also included is determining, for a current batch-stage, batch-units pending dispatch, in response to receiving the current batch-stage information; identifying physical threads that processed batch-units for a previous batch-stage on which the current batch-stage depends and have registered pending tasks for the current batch-stage; and dispatching the batch-units for the current batch-stage to the identified physical threads subsequent to complete processing of the batch-units for the previous batch-stage.
189 Citations
20 Claims
-
1. A method, comprising:
-
maintaining, by a grid scheduler, current batch stage information comprising current batch units and downstream batch units that depend on completion of the current batch units; determining, by the grid scheduler, a batch unit pending dispatch from the downstream batch units for a current batch stage identified in the current batch stage information; identifying, by the grid scheduler, one or more physical threads in a computing grid that processed batch units for the current batch stage on which the batch unit pending dispatch depends and have registered pending tasks for the current batch stage; and causing, by the grid scheduler, a dispatch of the batch unit pending dispatch to the one or more identified physical threads subsequent to complete processing of the batch units for the current batch stage, wherein at least one of the maintaining, determining, identifying, and causing are performed by one or more computers. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system, comprising:
-
a memory; and at least one processor coupled to the memory and configured to; maintain current batch stage information comprising current batch units and downstream batch units that depend on completion of the current batch units at a grid scheduler; determine a batch unit pending dispatch from the downstream batch units for a current batch stage identified in the current batch stage information; identify one or more physical threads in a computing grid that processed batch units for the current batch stage on which the batch unit pending dispatch depends and have registered pending tasks for the current batch stage; and cause a dispatch of the batch unit pending dispatch to the one or more identified physical threads subsequent to complete processing of the batch units for the current batch stage. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer-readable device having instructions stored thereon that, when executed by at least one computing device, causes the at least one computing device to perform operations comprising:
-
maintaining current batch stage information comprising current batch units and downstream batch units that depend on completion of the current batch units at a grid scheduler; determining a batch unit pending dispatch from the downstream batch units for a current batch stage identified in the current batch stage information; identifying one or more physical threads in a computing grid that processed batch units for the current batch stage on which the batch unit pending dispatch depends and have registered pending tasks for the current batch stage; and causing a dispatch of the batch unit pending dispatch to the one or more identified physical threads subsequent to complete processing of the batch units for the current batch stage. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification