Dynamic memory tuning for in-memory data analytic platforms

US 10,204,175 B2
Filed: 05/18/2016
Issued: 02/12/2019
Est. Priority Date: 05/18/2016
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

obtaining, at a cache manager of a directed acyclic graph-based data analytic platform, from each of a plurality of monitor components on a plurality of worker nodes of said directed acyclic graph-based data analytic platform, memory usage statistics for said worker nodes of said directed acyclic graph-based data analytic platform, said worker nodes having a plurality of tasks executing thereon, each of said tasks having at least one distributed dataset associated therewith, each of said worker nodes having a distributed dataset cache, said memory usage statistics comprising garbage collection time;

initiating, with said plurality of monitor components, a message to a decider component to reduce a memory allocation size of a given one of said distributed dataset caches upon determining, with said plurality of monitor components, that said garbage collection time exceeds a threshold;

when said decider component obtains said message to reduce said size of said given one of said distributed dataset caches;

obtaining, by said decider component, from a directed acyclic graph scheduler component, a current stage directed acyclic graph;

with said decider component, determining at least one of said distributed datasets to drop, based on said current stage of said directed acyclic graph, to avoid dropping other, needed ones of said distributed datasets; and

initiating, with said decider component, a message to said cache manager to drop said at least one of said distributed datasets; and

when said cache manager receives said message to drop said at least one of said distributed datasets, reducing said memory allocation size of said given one of said distributed dataset caches after dropping said at least one of said distributed datasets,wherein determining at least one of said distributed datasets to drop further comprises;

identifying, from said current stage directed acyclic graph, a first plurality of said distributed datasets currently stored in said given one of said distributed dataset caches that are being processed by tasks in a current epoch;

identifying, from said current stage directed acyclic graph, a second plurality of said distributed datasets currently stored in said given one of said distributed dataset caches that will be used by tasks to be processed in a next epoch;

identifying, in said given one of said distributed dataset caches, at least one of said distributed datasets that is not a member of said first or second pluralities of said distributed datasets; and

selecting to drop said at least one of said distributed datasets that is not a member of said first or second pluralities of said distributed datasets.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

At a cache manager of a directed acyclic graph-based data analytic platform, memory usage statistics are obtained from each of a plurality of monitor components on a plurality of worker nodes. The worker nodes have a plurality of tasks executing thereon, and each of the tasks has at least one distributed dataset associated therewith. Each of the worker nodes has a distributed dataset cache. At least one of the following is carried out: increasing a size of a given one of the distributed dataset caches if the memory usage statistics indicate that corresponding ones of the tasks are using too little memory; and decreasing a size of another given one of the distributed dataset caches if the memory usage statistics indicate contention between corresponding ones of the tasks and a corresponding one of the distributed datasets.

Citations

14 Claims

1. A method comprising:
- obtaining, at a cache manager of a directed acyclic graph-based data analytic platform, from each of a plurality of monitor components on a plurality of worker nodes of said directed acyclic graph-based data analytic platform, memory usage statistics for said worker nodes of said directed acyclic graph-based data analytic platform, said worker nodes having a plurality of tasks executing thereon, each of said tasks having at least one distributed dataset associated therewith, each of said worker nodes having a distributed dataset cache, said memory usage statistics comprising garbage collection time;
  
  initiating, with said plurality of monitor components, a message to a decider component to reduce a memory allocation size of a given one of said distributed dataset caches upon determining, with said plurality of monitor components, that said garbage collection time exceeds a threshold;
  
  when said decider component obtains said message to reduce said size of said given one of said distributed dataset caches;
  
  obtaining, by said decider component, from a directed acyclic graph scheduler component, a current stage directed acyclic graph;
  
  with said decider component, determining at least one of said distributed datasets to drop, based on said current stage of said directed acyclic graph, to avoid dropping other, needed ones of said distributed datasets; and
  
  initiating, with said decider component, a message to said cache manager to drop said at least one of said distributed datasets; and
  
  when said cache manager receives said message to drop said at least one of said distributed datasets, reducing said memory allocation size of said given one of said distributed dataset caches after dropping said at least one of said distributed datasets,wherein determining at least one of said distributed datasets to drop further comprises;
  
  identifying, from said current stage directed acyclic graph, a first plurality of said distributed datasets currently stored in said given one of said distributed dataset caches that are being processed by tasks in a current epoch;
  
  identifying, from said current stage directed acyclic graph, a second plurality of said distributed datasets currently stored in said given one of said distributed dataset caches that will be used by tasks to be processed in a next epoch;
  
  identifying, in said given one of said distributed dataset caches, at least one of said distributed datasets that is not a member of said first or second pluralities of said distributed datasets; and
  
  selecting to drop said at least one of said distributed datasets that is not a member of said first or second pluralities of said distributed datasets.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein said at least one distributed dataset comprises a resilient distributed dataset, and said distributed dataset caches comprise resilient distributed dataset caches.
  - 3. The method of claim 1, wherein:
    - said memory usage statistics indicate that corresponding ones of said tasks are using too little memory when said garbage collection time does not exceed said threshold; and
      
      said memory usage statistics indicate contention between corresponding ones of said tasks and a corresponding one of said distributed datasets when said garbage collection time does exceed said threshold.
  - 4. The method of claim 1, further comprising:
    - initiating, with said plurality of monitor components, a message to said decider component to increase a memory allocation size of another given one of said distributed dataset caches upon determining, with said plurality of monitor components, that said garbage collection time does not exceed said threshold.
  - 5. The method of claim 1, wherein, in said initiating steps, said decider component is realized within said cache manager.
  - 6. The method of claim 1, wherein, in said initiating steps, said decider component is realized separately from but in communication with said cache manager.

7. A directed acyclic graph-based data analytic platform comprising:
- a plurality of worker nodes;
  
  a plurality of monitor components on said plurality of worker nodes;
  
  a plurality of distributed dataset caches on said plurality of worker nodes;
  
  a cache manager, coupled to said plurality of monitor components, which;
  
  obtains, from each of said plurality of monitor components on said plurality of worker nodes, memory usage statistics for said worker nodes, said worker nodes having a plurality of tasks executing thereon, each of said tasks having at least one distributed dataset associated therewith, said memory usage statistics comprising garbage collection time; and
  
  carries out at least one of;
  
  increasing a memory allocation size of a given one of said distributed dataset caches if said memory usage statistics indicate that corresponding ones of said tasks are using too little memory; and
  
  decreasing a memory allocation size of another given one of said distributed dataset caches if said memory usage statistics indicate contention between corresponding ones of said tasks and a corresponding one of said distributed datasets;
  
  a decider component coupled to said cache manager and to said plurality of monitor components, wherein said plurality of monitor components;
  
  determine whether said garbage collection time exceeds a threshold; and
  
  when said garbage collection time does exceed said threshold, initiate a message to said decider component to reduce said memory allocation size of said another given one of said distributed dataset caches; and
  
  a directed acyclic graph scheduler component coupled to said decider component,wherein, when said decider component obtains said message to reduce said memory allocation size of said another given one of said distributed dataset caches;
  
  said decider component obtains, from said directed acyclic graph scheduler component, a current stage directed acyclic graph;
  
  said decider component determines at least one of said distributed datasets to drop, based on said current stage of said directed acyclic graph, to avoid dropping other, needed ones of said distributed datasets; and
  
  said decider component initiates a message to said cache manager to drop said at least one of said distributed datasets;
  
  when said cache manager receives said message to drop said at least one of said distributed datasets, said cache manager reduces said memory allocation size of said another given one of said distributed dataset caches after dropping said at least one of said distributed datasets; and
  
  wherein determining at least one of said distributed datasets to drop further comprises;
  
  identifying, from said current stage directed acyclic graph, a first plurality of said distributed datasets currently stored in said another given one of said distributed dataset caches that are being processed by tasks in a current epoch;
  
  identifying, from said current stage directed acyclic graph, a second plurality of said distributed datasets currently stored in said another given one of said distributed dataset caches that will be used by tasks to be processed in a next epoch;
  
  identifying, in said another given one of said distributed dataset caches, at least one of said distributed datasets that is not a member of said first or second pluralities of said distributed datasets; and
  
  selecting to drop said at least one of said distributed datasets that is not a member of said first or second pluralities of said distributed datasets.
- View Dependent Claims (8, 9, 10, 11)
- - 8. The directed acyclic graph-based data analytic platform of claim 7, wherein said at least one distributed dataset comprises a resilient distributed dataset, and said distributed dataset caches comprise resilient distributed dataset caches.
  - 9. The directed acyclic graph-based data analytic platform of claim 7 wherein when said garbage collection time does not exceed said threshold, said plurality of monitor components initiate a message to said decider component to increase said memory allocation size of said given one of said distributed dataset caches.
  - 10. The directed acyclic graph-based data analytic platform of claim 7, wherein said decider component is realized within said cache manager.
  - 11. The directed acyclic graph-based data analytic platform of claim 7, wherein said decider component is realized separately from but in communication with said cache manager.

12. A non-transitory computer readable medium comprising computer executable instructions which when executed by a computer cause the computer to perform a method comprising:
- obtaining, at a cache manager of a directed acyclic graph-based data analytic platform, from each of a plurality of monitor components on a plurality of worker nodes of said directed acyclic graph-based data analytic platform, memory usage statistics for said worker nodes of said directed acyclic graph-based data analytic platform, said worker nodes having a plurality of tasks executing thereon, each of said tasks having at least one distributed dataset associated therewith, each of said worker nodes having a distributed dataset cache, said memory usage statistics comprising garbage collection time;
  
  determining, with said plurality of monitor components, whether said garbage collection time exceeds a threshold;
  
  when said garbage collection time does exceed said threshold, initiating, with said plurality of monitor components, a message to a decider component to reduce a memory allocation size of a given one of said distributed dataset caches;
  
  when said decider component obtains said message to reduce said memory allocation size of said given one of said distributed dataset caches;
  
  obtaining, by said decider component, from a directed acyclic graph scheduler component, a current stage directed acyclic graph;
  
  with said decider component, determining at least one of said distributed datasets to drop, based on said current stage of said directed acyclic graph, to avoid dropping other, needed ones of said distributed datasets; and
  
  initiating, with said decider component, a message to said cache manager to drop said at least one of said distributed datasets; and
  
  when said cache manager receives said message to drop said at least one of said distributed datasets, reducing said memory allocation size of said given one of said distributed dataset caches after dropping said at least one of said distributed datasets;
  
  wherein determining at least one of said distributed datasets to drop further comprises;
  
  identifying, from said current stage directed acyclic graph, a first plurality of said distributed datasets currently stored in said given one of said distributed dataset caches that are being processed by tasks in a current epoch;
  
  identifying, from said current stage directed acyclic graph, a second plurality of said distributed datasets currently stored in said given one of said distributed dataset caches that will be used by tasks to be processed in a next epoch;
  
  identifying, in said given one of said distributed dataset caches, at least one of said distributed datasets that is not a member of said first or second pluralities of said distributed datasets; and
  
  selecting to drop said at least one of said distributed datasets that is not a member of said first or second pluralities of said distributed datasets.
- View Dependent Claims (13, 14)
- - 13. The non-transitory computer readable medium of claim 12, wherein, in said method step of obtaining said memory usage statistics and said method step of carrying out at least one of increasing and decreasing, said at least one distributed dataset comprises a resilient distributed dataset, and said distributed dataset caches comprise resilient distributed dataset caches.
  - 14. The non-transitory computer readable medium of claim 12, wherein said computer executable instructions, when executed by said computer, further cause the computer to perform:
    - when said garbage collection time does not exceed said threshold, initiating, with said plurality of monitor components, a message to said decider component to increase said memory allocation size of said given one of said distributed dataset caches.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Hu, Zhenhua, Li, Min, Xu, Luna, Wang, Yandong, Zhang, Li
Primary Examiner(s)
Yi, David
Assistant Examiner(s)
Ahmed, Zubair

Application Number

US15/157,683
Publication Number

US 20170337135A1
Time in Patent Office

1,000 Days
Field of Search
US Class Current
CPC Class Codes

G06F 12/0862   with prefetch

G06F 16/9024   Graphs; Linked lists G06F16...

G06F 2212/621   Coherency control relating ...

Dynamic memory tuning for in-memory data analytic platforms

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Dynamic memory tuning for in-memory data analytic platforms

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links