Managing processing of long tail task sequences in a stream processing framework
First Claim
1. A method including:
- operations to process long tail task sequences in a stream processing framework in a computing grid, the operations including;
operating a computing grid that includes a plurality of physical threads which processes data from one or more near real-time (NRT) data streams for multiple task sequences;
queuing data from the one or more NRT data streams as batches in multiple pipelines using a grid-coordinator that controls dispatch of the batches to the physical threads;
assigning a priority-level to each of the pipelines using a grid-scheduler, wherein the grid-scheduler initiates execution of a first number of batches from a first pipeline before execution of a second number of batches from a second pipeline, responsive to respective priority levels of the first and second pipelines;
detecting that a NRT data stream is emitting measurably less data than before and determining that the NRT data stream should be classified as a long tail task sequence;
in response to the determining, migrating the NRT data stream for the long tail task sequence to a low-priority pipeline; and
processing data from the migrated NRT data stream using the low-priority pipeline.
1 Assignment
0 Petitions
Accused Products
Abstract
The technology disclosed relates to managing processing of long tail task sequences in a stream processing framework. In particular, it relates to operating a computing grid that includes a plurality of physical threads which processes data from one or more near real-time (NRT) data streams for multiple task sequences, and queuing data from the NRT data streams as batches in multiple pipelines using a grid-coordinator that controls dispatch of the batches to the physical threads. The method also includes assigning a priority-level to each of the pipelines using a grid-scheduler, wherein the grid-scheduler initiates execution of a first number of batches from a first pipeline before execution of a second number of batches from a second pipeline, responsive to respective priority levels of the first and second pipelines.
178 Citations
20 Claims
-
1. A method including:
operations to process long tail task sequences in a stream processing framework in a computing grid, the operations including; operating a computing grid that includes a plurality of physical threads which processes data from one or more near real-time (NRT) data streams for multiple task sequences; queuing data from the one or more NRT data streams as batches in multiple pipelines using a grid-coordinator that controls dispatch of the batches to the physical threads; assigning a priority-level to each of the pipelines using a grid-scheduler, wherein the grid-scheduler initiates execution of a first number of batches from a first pipeline before execution of a second number of batches from a second pipeline, responsive to respective priority levels of the first and second pipelines; detecting that a NRT data stream is emitting measurably less data than before and determining that the NRT data stream should be classified as a long tail task sequence; in response to the determining, migrating the NRT data stream for the long tail task sequence to a low-priority pipeline; and processing data from the migrated NRT data stream using the low-priority pipeline. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 11)
-
9. A method including:
operations to process surging task sequences in a stream processing framework in a computing grid, the operations including; operating a computing grid that includes a plurality of physical threads which processes data from one or more near real-time (NRT) data streams for multiple task sequences; queuing data from the one or more NRT data streams as batches in multiple pipelines using a grid-coordinator that controls dispatch of the batches to the physical threads; assigning a priority-level to each of the pipelines using a grid-scheduler, wherein the grid- scheduler initiates execution of a first number of batches from a first pipeline before execution of a second number of batches from a second pipeline, responsive to respective priority levels of the first and second pipelines; detecting that at least one NRT data stream is emitting measurably more data than before and determining that it should be classified as surging in NRT data stream; in response to the determining, migrating the surging NRT data stream to a high-priority pipeline; and processing data from the surging NRT data stream using the high-priority pipeline. - View Dependent Claims (10)
-
12. A system including:
one or more processors coupled to memory, the memory loaded with computer instructions, the instructions, when executed on the one or more processors, implement operations to process long tail task sequences in a stream processing framework in a computing grid, the operations including; operating a computing grid that includes a plurality of physical threads which processes data from one or more near real-time (NRT) data streams for multiple task sequences; queuing data from the one or more NRT data streams as batches in multiple pipelines using a grid-coordinator that controls dispatch of the batches to the physical threads; assigning a priority-level to each of the pipelines using a grid-scheduler to control execution of a number of batches from a first pipeline before execution of a number of batches from a second pipeline; detecting that a NRT data stream is emitting measurably less data than before; and
determining that the NRT data stream should be classified as a long tail task sequence;in response to the determining, migrating the NRT data stream for the long tail task sequence from the long tail task sequences to a low-priority pipeline; and processing data from the migrated NRT data stream using the low-priority pipeline. - View Dependent Claims (13, 14, 15, 16)
-
17. A non-transitory computer readable storage medium impressed with computer program instructions to implement a method comprising:
operations to process long tail task sequences in a stream processing framework in a computing grid, the operations including; operating a computing grid that includes a plurality of physical threads which processes data from one or more near real-time (NRT) data streams for multiple task sequences; queuing data from the one or more NRT data streams as batches in multiple pipelines using a grid-coordinator that controls dispatch of the batches to the physical threads; assigning a priority-level to each of the pipelines using a grid-scheduler, wherein the grid-scheduler initiates execution of a first number of batches from a first pipeline before execution of a second number of batches from a second pipeline, responsive to respective priority levels of the first and second pipelines; detecting that a NRT data stream for a long tail task sequence is emitting measurably less data than before and determining that the NRT data stream should be classified as a long tail task sequence; in response to the determining, migrating the NRT data stream for the long tail task sequence to a low-priority pipeline; and processing data from the migrated NRT data stream using the low-priority pipeline. - View Dependent Claims (18, 19, 20)
Specification