Managing processing of long tail task sequences in a stream processing framework

US 9,842,000 B2
Filed: 12/31/2015
Issued: 12/12/2017
Est. Priority Date: 09/18/2015
Status: Active Grant

First Claim

Patent Images

1. A method including:

operations to process long tail task sequences in a stream processing framework in a computing grid, the operations including;

operating a computing grid that includes a plurality of physical threads which processes data from one or more near real-time (NRT) data streams for multiple task sequences;

queuing data from the one or more NRT data streams as batches in multiple pipelines using a grid-coordinator that controls dispatch of the batches to the physical threads;

assigning a priority-level to each of the pipelines using a grid-scheduler, wherein the grid-scheduler initiates execution of a first number of batches from a first pipeline before execution of a second number of batches from a second pipeline, responsive to respective priority levels of the first and second pipelines;

detecting that a NRT data stream is emitting measurably less data than before and determining that the NRT data stream should be classified as a long tail task sequence;

in response to the determining, migrating the NRT data stream for the long tail task sequence to a low-priority pipeline; and

processing data from the migrated NRT data stream using the low-priority pipeline.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The technology disclosed relates to managing processing of long tail task sequences in a stream processing framework. In particular, it relates to operating a computing grid that includes a plurality of physical threads which processes data from one or more near real-time (NRT) data streams for multiple task sequences, and queuing data from the NRT data streams as batches in multiple pipelines using a grid-coordinator that controls dispatch of the batches to the physical threads. The method also includes assigning a priority-level to each of the pipelines using a grid-scheduler, wherein the grid-scheduler initiates execution of a first number of batches from a first pipeline before execution of a second number of batches from a second pipeline, responsive to respective priority levels of the first and second pipelines.

178 Citations

20 Claims

1. A method including:
- operations to process long tail task sequences in a stream processing framework in a computing grid, the operations including;
  
  operating a computing grid that includes a plurality of physical threads which processes data from one or more near real-time (NRT) data streams for multiple task sequences;
  
  queuing data from the one or more NRT data streams as batches in multiple pipelines using a grid-coordinator that controls dispatch of the batches to the physical threads;
  
  assigning a priority-level to each of the pipelines using a grid-scheduler, wherein the grid-scheduler initiates execution of a first number of batches from a first pipeline before execution of a second number of batches from a second pipeline, responsive to respective priority levels of the first and second pipelines;
  
  detecting that a NRT data stream is emitting measurably less data than before and determining that the NRT data stream should be classified as a long tail task sequence;
  
  in response to the determining, migrating the NRT data stream for the long tail task sequence to a low-priority pipeline; and
  
  processing data from the migrated NRT data stream using the low-priority pipeline.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 11)
- - 2. The method of claim 1, further including:
    - tracking the NRT data streams using a filter server that detects data belonging to a particular long tail task sequence based on a unique task sequence identifier (ID); and
      
      migrating the detected data belonging to the particular long tail task sequence to a pipeline with low-priority.
  - 3. The method of claim 1, wherein the first pipeline has a 10 priority-level and the second pipeline has a 9 priority-level.
  - 4. The method of claim 3, further including dispatching 10 batches for the first pipeline before dispatching 9 batches for the second pipeline.
  - 5. The method of claim 1, wherein the grid-scheduler is in communication with the grid-coordinator, further including transmitting priority-levels for each of the pipelines from the grid-scheduler to the grid-coordinator.
  - 6. The method of claim 5, further including:
    - the grid-coordinator, determining a dispatch-count for each of the pipelines based on respective priority-levels; and
      
      the grid-coordinator, dispatching a dispatch-count number of batches for a pipeline before dispatching a dispatch-count number of batches for another pipeline.
  - 7. The method of claim 6, wherein the priority-level for a pipeline is high and a corresponding dispatch-count is 10.
  - 8. The method of claim 7, wherein corresponding dispatch-counts for priority-levels are externalized.
  - 11. The method of claim 1, further including:
    - tracking statues of the pipelines using the grid-scheduler; and
      
      in response to detecting an empty pipeline, informing the grid-coordinator to cease waiting for batches from the empty pipeline and initiate dispatch of batches from a non-empty pipeline.

9. A method including:
- operations to process surging task sequences in a stream processing framework in a computing grid, the operations including;
  
  operating a computing grid that includes a plurality of physical threads which processes data from one or more near real-time (NRT) data streams for multiple task sequences;
  
  queuing data from the one or more NRT data streams as batches in multiple pipelines using a grid-coordinator that controls dispatch of the batches to the physical threads;
  
  assigning a priority-level to each of the pipelines using a grid-scheduler, wherein the grid- scheduler initiates execution of a first number of batches from a first pipeline before execution of a second number of batches from a second pipeline, responsive to respective priority levels of the first and second pipelines;
  
  detecting that at least one NRT data stream is emitting measurably more data than before and determining that it should be classified as surging in NRT data stream;
  
  in response to the determining, migrating the surging NRT data stream to a high-priority pipeline; and
  
  processing data from the surging NRT data stream using the high-priority pipeline.
- View Dependent Claims (10)
- - 10. The method of claim 9, further including:
    - tracking the NRT data streams using a filter server that detects data belonging to a particular surging task sequence based on a unique task sequence identifier (ID); and
      
      migrating the detected data belonging to the particular surging task sequence to a pipeline with high-priority.

12. A system including:
- one or more processors coupled to memory, the memory loaded with computer instructions, the instructions, when executed on the one or more processors, implement operations to process long tail task sequences in a stream processing framework in a computing grid, the operations including;
  
  operating a computing grid that includes a plurality of physical threads which processes data from one or more near real-time (NRT) data streams for multiple task sequences;
  
  queuing data from the one or more NRT data streams as batches in multiple pipelines using a grid-coordinator that controls dispatch of the batches to the physical threads;
  
  assigning a priority-level to each of the pipelines using a grid-scheduler to control execution of a number of batches from a first pipeline before execution of a number of batches from a second pipeline;
  
  detecting that a NRT data stream is emitting measurably less data than before; and
  
  determining that the NRT data stream should be classified as a long tail task sequence;
  
  in response to the determining, migrating the NRT data stream for the long tail task sequence from the long tail task sequences to a low-priority pipeline; and
  
  processing data from the migrated NRT data stream using the low-priority pipeline.
- View Dependent Claims (13, 14, 15, 16)
- - 13. The system of claim 12, further implementing actions comprising:
    - tracking the NRT data streams using a filter server that detects data belonging to a particular long tail task sequence based on a unique task sequence identifier (ID); and
      
      migrating the detected data belonging to the particular long tail task sequence to a pipeline with low-priority.
  - 14. The system of claim 12, further implementing actions comprising:
    - detecting that at least one NRT data stream for a surging task sequence is emitting measurably more data than before; and
      
      migrating the NRT data stream for the surging task sequence to a pipeline with high-priority.
  - 15. The system of claim 14, further implementing actions comprising:
    - tracking the NRT data streams using a filter server that detects data belonging to a particular surging task sequence based on a unique task sequence identifier (ID); and
      
      migrating the detected data belonging to the particular surging task sequence to a pipeline with high-priority.
  - 16. The system of claim 12, further implementing actions comprising:
    - tracking statues of the pipelines using the grid-scheduler; and
      
      in response to detecting an empty pipeline, informing the grid-coordinator to cease waiting for batches from the empty pipeline and initiate dispatch of batches from a non-empty pipeline.

17. A non-transitory computer readable storage medium impressed with computer program instructions to implement a method comprising:
- operations to process long tail task sequences in a stream processing framework in a computing grid, the operations including;
  
  operating a computing grid that includes a plurality of physical threads which processes data from one or more near real-time (NRT) data streams for multiple task sequences;
  
  queuing data from the one or more NRT data streams as batches in multiple pipelines using a grid-coordinator that controls dispatch of the batches to the physical threads;
  
  assigning a priority-level to each of the pipelines using a grid-scheduler, wherein the grid-scheduler initiates execution of a first number of batches from a first pipeline before execution of a second number of batches from a second pipeline, responsive to respective priority levels of the first and second pipelines;
  
  detecting that a NRT data stream for a long tail task sequence is emitting measurably less data than before and determining that the NRT data stream should be classified as a long tail task sequence;
  
  in response to the determining, migrating the NRT data stream for the long tail task sequence to a low-priority pipeline; and
  
  processing data from the migrated NRT data stream using the low-priority pipeline.
- View Dependent Claims (18, 19, 20)
- - 18. The non-transitory computer readable storage medium of claim 17, implementing the method further comprising:
    - tracking the NRT data streams using a filter server that detects data belonging to a particular long tail task sequence based on a unique task sequence identifier (ID); and
      
      migrating the detected data belonging to the particular long tail task sequence to a pipeline with low-priority.
  - 19. The non-transitory computer readable storage medium of claim 17, implementing the method further comprising:
    - detecting that at least one NRT data stream for a surging task sequence is emitting measurably more data than before; and
      
      migrating the NRT data stream for the surging task sequence to a pipeline with high-priority.
  - 20. The non-transitory computer readable storage medium of claim 17, implementing the method further comprising:
    - tracking statues of the pipelines using the grid-scheduler; and
      
      in response to detecting an empty pipeline, informing the grid-coordinator to cease waiting for batches from the empty pipeline and initiate dispatch of batches from a non-empty pipeline.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Salesforce.com, Inc.
Original Assignee
Salesforce.com, Inc.
Inventors
Bishop, Elden Gregory, Chao, Jeffrey
Primary Examiner(s)
Sun, Charlie

Application Number

US14/986,419
Publication Number

US 20170083378A1
Time in Patent Office

712 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/24568   Data stream processing; Con...

G06F 9/5038   considering the execution o...

G06F 9/5072   Grid computing

G06F 9/5088   involving task migration

Managing processing of long tail task sequences in a stream processing framework

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

178 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Managing processing of long tail task sequences in a stream processing framework

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

178 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links