MANAGING RESOURCE ALLOCATION IN A STREAM PROCESSING FRAMEWORK

US 20170083380A1
Filed: 01/12/2016
Published: 03/23/2017
Est. Priority Date: 09/18/2015
Status: Active Grant

First Claim

Patent Images

1. A method of managing resource allocation to task sequences that have long tails, the method including:

operating a computing grid that includes machine resources, with heterogeneous containers defined over whole machines and some containers including multiple machines;

initially allocating multiple machines to a first container;

initially allocating first set of stateful task sequences to the first container;

running the first set of stateful task sequences as multiplexed units of work in the first container under control of a container-scheduler, wherein each unit of work for a first task sequence runs to completion on first machine resources in the first container, unless it overruns a time-out, before a next unit of work for a second task sequence runs multiplexed on the first machine resources;

detecting that at least one long tail task sequence is consuming measurably fewer resources than initially allocated; and

responsive to the detecting, automatically allocating one or more additional stateful task sequences to the first container or deallocating one or more machines from the first container.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The technology disclosed relates to managing resource allocation to task sequences in a stream processing framework. In particular, it relates to operating a computing grid that includes machine resources, with heterogeneous containers defined over whole machines and some containers including multiple machines. It also includes initially allocating multiple machines to a first container, initially allocating first set of stateful task sequences to the first container, running the first set of stateful task sequences as multiplexed units of work under control of a container-scheduler, where each unit of work for a first task sequence runs to completion on first machine resources in the first container, unless it overruns a time-out, before a next unit of work for a second task sequence runs multiplexed on the first machine resources. It further includes automatically modifying a number of machine resources and/or a number assigned task sequences to a container.

40 Citations

View as Search Results

20 Claims

1. A method of managing resource allocation to task sequences that have long tails, the method including:
- operating a computing grid that includes machine resources, with heterogeneous containers defined over whole machines and some containers including multiple machines;
  
  initially allocating multiple machines to a first container;
  
  initially allocating first set of stateful task sequences to the first container;
  
  running the first set of stateful task sequences as multiplexed units of work in the first container under control of a container-scheduler, wherein each unit of work for a first task sequence runs to completion on first machine resources in the first container, unless it overruns a time-out, before a next unit of work for a second task sequence runs multiplexed on the first machine resources;
  
  detecting that at least one long tail task sequence is consuming measurably fewer resources than initially allocated; and
  
  responsive to the detecting, automatically allocating one or more additional stateful task sequences to the first container or deallocating one or more machines from the first container.
- View Dependent Claims (2, 3, 4, 18)
- - 2. The method of claim 1, further including:
    - detecting that excess machine resources are dedicated to the first set of task sequences;
      
      scheduling the first set of task sequences over a first subset of machines in the first container to leave a second subset of machines in the first container unused; and
      
      automatically deallocating the second subset of machines from the first container.
  - 3. The method of claim 1, further including:
    - detecting that excess machine resources are dedicated to the first set of task sequences; and
      
      automatically allocating additional stateful task sequences to the first container.
  - 4. The method of claim 3, wherein a number of task sequences running in the first container increases from 4 to 20.
  - 18. The method of claim 1, further including a non-transitory computer readable storage medium impressed with computer program instructions that, when executed on a processor, implement the method of claim 1.

5. A method of managing resource allocation to surging task sequences, the method including:
- operating a computing grid that includes machine resources, with heterogeneous containers defined over whole machines and some containers including multiple machines;
  
  initially allocating multiple machines to a first container;
  
  initially allocating first set of stateful task sequences to the first container;
  
  running the first set of stateful task sequences as multiplexed units of work in the first container under control of a container-scheduler, wherein each unit of work for a first task sequence runs to completion on first machine resources in the first container, unless it overruns a time-out, before a next unit of work for a second task sequence runs multiplexed on the first machine resources;
  
  detecting that at least one task sequence is requiring measurably more resources than initially allocated;
  
  determining that the multiple machines allocated to the first container have not yet reached a predetermined maximum; and
  
  automatically allocating more machines to the first container or reallocating some task sequences in the first set of task sequences from the first container to a second container.
- View Dependent Claims (6, 7, 8, 9, 19)
- - 6. The method of claim 5, further including:
    - detecting latency exceeding a predetermined threshold in running the first set of tasks sequences in the first container;
      
      automatically allocating additional machines to the first container; and
      
      scheduling units of work for the first set of task sequences over the multiple machines and the additional machines in the first container.
  - 7. The method of claim 6, wherein a number of machines in the first container increases from 8 to 10.
  - 8. The method of claim 5, further including:
    - detecting latency exceeding a predetermined threshold in running the first set of tasks sequences in the first container; and
      
      determining that the multiple machines allocated to the first container have reached a predetermined maximum and that no more machines will be allocated to the first set of task sequences.
  - 9. The method of claim 5, further including:
    - persisting state information of the first set of task sequences, during running; and
      
      detecting latency exceeding a predetermined threshold in running the first set of tasks sequences in the first container;
      
      automatically reallocating a second subset of task sequences from the first container to a second container, while leaving a first subset of task sequences allocated to the first container;
      
      initializing the second subset of task sequences in the second container using the persisted state information; and
      
      scheduling units of work for the first subset of task sequences in the first container and the second subset of task sequences in the second container.
  - 19. The method of claim 5, further including a non-transitory computer readable storage medium impressed with computer program instructions that, when executed on a processor, implement the method of claim 5.

10. A method of managing resource allocation to faulty task sequences, the method including:
- initially allocating first set of stateful task sequences to a first container;
  
  receiving input from a replayable input source and triggering stateful first set of tasks sequences to process the input;
  
  running the first set of stateful task sequences as multiplexed units of work in the first container under control of a container-scheduler, where each unit of work for a first task sequence runs to completion on first machine resources in the first container, unless it overruns a time-out, before a next unit of work for a second task sequence runs multiplexed on the first machine resources;
  
  during running, persisting state information of the first set of task sequences;
  
  detecting runtime of a unit of work in a faulty task sequence exceeding a predetermined timeout threshold;
  
  restarting the faulty task sequence by automatically reloading persisted state information of the faulty task sequence;
  
  automatically rewinding a replayable input to the faulty task sequence to a point preceding the detecting and synchronized with the persisted state information for the faulty task sequence; and
  
  rerunning the faulty task sequence to completion of the unit of work without exceeding the predetermined timeout threshold.
- View Dependent Claims (11, 20)
- - 11. The method of claim 10, further including:
    - reallocating the faulty task sequence from the first container to a second container;
      
      restarting the faulty task sequence at the second container by reloading persisted state information of the faulty task sequence at the second container;
      
      rewinding a replayable input to the faulty task sequence at the second container to a point preceding the detecting and synchronized with the persisted state information for the faulty task sequence; and
      
      rerunning the faulty task sequence to completion of the unit of work without exceeding the predetermined timeout threshold.
  - 20. The method of claim 10, further including a non-transitory computer readable storage medium impressed with computer program instructions that, when executed on a processor, implement the method of claim 10.

12. A system including one or more processors coupled to memory, the memory loaded with computer instructions to manage resource allocation to task sequences that have long tails, the instructions, when executed on the processors, implement actions comprising:
- operating a computing grid that includes machine resources, with heterogeneous containers defined over whole machines and some containers including multiple machines;
  
  initially allocating multiple machines to a first container;
  
  initially allocating first set of stateful task sequences to the first container;
  
  running the first set of stateful task sequences as multiplexed units of work in the first container under control of a container-scheduler, where each unit of work for a first task sequence runs to completion on first machine resources in the first container, unless it overruns a time-out, before a next unit of work for a second task sequence runs multiplexed on the first machine resources;
  
  detecting that at least one long tail task sequence is consuming measurably fewer resources than initially allocated; and
  
  responsive to the detecting, automatically allocating one or more additional stateful task sequences to the first container or deallocating one or more machines from the first container.
- View Dependent Claims (13, 14, 15, 16, 17)
- - 13. The system of claim 12, further implementing actions comprising:
    - detecting that excess machine resources are dedicated to the first set of task sequences;
      
      scheduling the first set of task sequences over a first subset of machines in the first container to leave a second subset of machines in the first container unused; and
      
      automatically deallocating the second subset of machines from the first container.
  - 14. The system of claim 12, further implementing actions comprising:
    - detecting that excess machine resources are dedicated to the first set of task sequences; and
      
      automatically allocating additional stateful task sequences to the first container.
  - 15. The system of claim 14, wherein a number of task sequences running in the first container increases from 4 to 20.
  - 16. The system of claim 12, further implementing actions comprising:
    - operating a computing grid that includes machine resources, with heterogeneous containers defined over whole machines and some containers including multiple machines;
      
      initially allocating multiple machines to a first container;
      
      initially allocating first set of stateful task sequences to the first container;
      
      running the first set of stateful task sequences as multiplexed units of work in the first container under control of a container-scheduler, where each unit of work for a first task sequence runs to completion on first machine resources in the first container, unless it overruns a time-out, before a next unit of work for a second task sequence runs multiplexed on the first machine resources;
      
      detecting that at least one task sequence is requiring measurably more resources than initially allocated, determining that the multiple machines allocated to the first container have not yet reached a predetermined maximum; and
      
      automatically allocating more machines to the first container or reallocating some task sequences in the first set of task sequences from the first container to a second container.
  - 17. The system of claim 12, further implementing actions comprising:
    - initially allocating first set of stateful task sequences to a first container;
      
      receiving input from a replayable input source and triggering stateful first set of tasks sequences to process the input;
      
      running the first set of stateful task sequences as multiplexed units of work in the first container under control of a container-scheduler, where each unit of work for a first task sequence runs to completion on first machine resources in the first container, unless it overruns a time-out, before a next unit of work for a second task sequence runs multiplexed on the first machine resources;
      
      during running, persisting state information of the first set of task sequences;
      
      detecting runtime of a unit of work in a faulty task sequence exceeding a predetermined timeout threshold;
      
      restarting the faulty task sequence by automatically reloading persisted state information of the faulty task sequence;
      
      automatically rewinding a replayable input to the faulty task sequence to a point preceding the detecting and synchronized with the persisted state information for the faulty task sequence; and
      
      rerunning the faulty task sequence to completion of the unit of work without exceeding the predetermined timeout threshold.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Salesforce.com, Inc.
Original Assignee
Salesforce.com, Inc.
Inventors
Bishop, Elden Gregory, Chao, Jeffrey

Granted Patent

US 10,146,592 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 9/5083 Techniques for rebalancing ...

MANAGING RESOURCE ALLOCATION IN A STREAM PROCESSING FRAMEWORK

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

40 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

MANAGING RESOURCE ALLOCATION IN A STREAM PROCESSING FRAMEWORK

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

40 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links