Dynamic memory tuning for in-memory data analytic platforms
First Claim
1. A method comprising:
- obtaining, at a cache manager of a directed acyclic graph-based data analytic platform, from each of a plurality of monitor components on a plurality of worker nodes of said directed acyclic graph-based data analytic platform, memory usage statistics for said worker nodes of said directed acyclic graph-based data analytic platform, said worker nodes having a plurality of tasks executing thereon, each of said tasks having at least one distributed dataset associated therewith, each of said worker nodes having a distributed dataset cache, said memory usage statistics comprising garbage collection time;
initiating, with said plurality of monitor components, a message to a decider component to reduce a memory allocation size of a given one of said distributed dataset caches upon determining, with said plurality of monitor components, that said garbage collection time exceeds a threshold;
when said decider component obtains said message to reduce said size of said given one of said distributed dataset caches;
obtaining, by said decider component, from a directed acyclic graph scheduler component, a current stage directed acyclic graph;
with said decider component, determining at least one of said distributed datasets to drop, based on said current stage of said directed acyclic graph, to avoid dropping other, needed ones of said distributed datasets; and
initiating, with said decider component, a message to said cache manager to drop said at least one of said distributed datasets; and
when said cache manager receives said message to drop said at least one of said distributed datasets, reducing said memory allocation size of said given one of said distributed dataset caches after dropping said at least one of said distributed datasets,wherein determining at least one of said distributed datasets to drop further comprises;
identifying, from said current stage directed acyclic graph, a first plurality of said distributed datasets currently stored in said given one of said distributed dataset caches that are being processed by tasks in a current epoch;
identifying, from said current stage directed acyclic graph, a second plurality of said distributed datasets currently stored in said given one of said distributed dataset caches that will be used by tasks to be processed in a next epoch;
identifying, in said given one of said distributed dataset caches, at least one of said distributed datasets that is not a member of said first or second pluralities of said distributed datasets; and
selecting to drop said at least one of said distributed datasets that is not a member of said first or second pluralities of said distributed datasets.
1 Assignment
0 Petitions
Accused Products
Abstract
At a cache manager of a directed acyclic graph-based data analytic platform, memory usage statistics are obtained from each of a plurality of monitor components on a plurality of worker nodes. The worker nodes have a plurality of tasks executing thereon, and each of the tasks has at least one distributed dataset associated therewith. Each of the worker nodes has a distributed dataset cache. At least one of the following is carried out: increasing a size of a given one of the distributed dataset caches if the memory usage statistics indicate that corresponding ones of the tasks are using too little memory; and decreasing a size of another given one of the distributed dataset caches if the memory usage statistics indicate contention between corresponding ones of the tasks and a corresponding one of the distributed datasets.
-
Citations
14 Claims
-
1. A method comprising:
-
obtaining, at a cache manager of a directed acyclic graph-based data analytic platform, from each of a plurality of monitor components on a plurality of worker nodes of said directed acyclic graph-based data analytic platform, memory usage statistics for said worker nodes of said directed acyclic graph-based data analytic platform, said worker nodes having a plurality of tasks executing thereon, each of said tasks having at least one distributed dataset associated therewith, each of said worker nodes having a distributed dataset cache, said memory usage statistics comprising garbage collection time; initiating, with said plurality of monitor components, a message to a decider component to reduce a memory allocation size of a given one of said distributed dataset caches upon determining, with said plurality of monitor components, that said garbage collection time exceeds a threshold; when said decider component obtains said message to reduce said size of said given one of said distributed dataset caches; obtaining, by said decider component, from a directed acyclic graph scheduler component, a current stage directed acyclic graph; with said decider component, determining at least one of said distributed datasets to drop, based on said current stage of said directed acyclic graph, to avoid dropping other, needed ones of said distributed datasets; and initiating, with said decider component, a message to said cache manager to drop said at least one of said distributed datasets; and when said cache manager receives said message to drop said at least one of said distributed datasets, reducing said memory allocation size of said given one of said distributed dataset caches after dropping said at least one of said distributed datasets, wherein determining at least one of said distributed datasets to drop further comprises; identifying, from said current stage directed acyclic graph, a first plurality of said distributed datasets currently stored in said given one of said distributed dataset caches that are being processed by tasks in a current epoch; identifying, from said current stage directed acyclic graph, a second plurality of said distributed datasets currently stored in said given one of said distributed dataset caches that will be used by tasks to be processed in a next epoch; identifying, in said given one of said distributed dataset caches, at least one of said distributed datasets that is not a member of said first or second pluralities of said distributed datasets; and selecting to drop said at least one of said distributed datasets that is not a member of said first or second pluralities of said distributed datasets. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A directed acyclic graph-based data analytic platform comprising:
-
a plurality of worker nodes; a plurality of monitor components on said plurality of worker nodes; a plurality of distributed dataset caches on said plurality of worker nodes; a cache manager, coupled to said plurality of monitor components, which; obtains, from each of said plurality of monitor components on said plurality of worker nodes, memory usage statistics for said worker nodes, said worker nodes having a plurality of tasks executing thereon, each of said tasks having at least one distributed dataset associated therewith, said memory usage statistics comprising garbage collection time; and carries out at least one of; increasing a memory allocation size of a given one of said distributed dataset caches if said memory usage statistics indicate that corresponding ones of said tasks are using too little memory; and decreasing a memory allocation size of another given one of said distributed dataset caches if said memory usage statistics indicate contention between corresponding ones of said tasks and a corresponding one of said distributed datasets; a decider component coupled to said cache manager and to said plurality of monitor components, wherein said plurality of monitor components; determine whether said garbage collection time exceeds a threshold; and when said garbage collection time does exceed said threshold, initiate a message to said decider component to reduce said memory allocation size of said another given one of said distributed dataset caches; and a directed acyclic graph scheduler component coupled to said decider component, wherein, when said decider component obtains said message to reduce said memory allocation size of said another given one of said distributed dataset caches; said decider component obtains, from said directed acyclic graph scheduler component, a current stage directed acyclic graph; said decider component determines at least one of said distributed datasets to drop, based on said current stage of said directed acyclic graph, to avoid dropping other, needed ones of said distributed datasets; and said decider component initiates a message to said cache manager to drop said at least one of said distributed datasets; when said cache manager receives said message to drop said at least one of said distributed datasets, said cache manager reduces said memory allocation size of said another given one of said distributed dataset caches after dropping said at least one of said distributed datasets; and wherein determining at least one of said distributed datasets to drop further comprises; identifying, from said current stage directed acyclic graph, a first plurality of said distributed datasets currently stored in said another given one of said distributed dataset caches that are being processed by tasks in a current epoch; identifying, from said current stage directed acyclic graph, a second plurality of said distributed datasets currently stored in said another given one of said distributed dataset caches that will be used by tasks to be processed in a next epoch; identifying, in said another given one of said distributed dataset caches, at least one of said distributed datasets that is not a member of said first or second pluralities of said distributed datasets; and selecting to drop said at least one of said distributed datasets that is not a member of said first or second pluralities of said distributed datasets. - View Dependent Claims (8, 9, 10, 11)
-
-
12. A non-transitory computer readable medium comprising computer executable instructions which when executed by a computer cause the computer to perform a method comprising:
-
obtaining, at a cache manager of a directed acyclic graph-based data analytic platform, from each of a plurality of monitor components on a plurality of worker nodes of said directed acyclic graph-based data analytic platform, memory usage statistics for said worker nodes of said directed acyclic graph-based data analytic platform, said worker nodes having a plurality of tasks executing thereon, each of said tasks having at least one distributed dataset associated therewith, each of said worker nodes having a distributed dataset cache, said memory usage statistics comprising garbage collection time; determining, with said plurality of monitor components, whether said garbage collection time exceeds a threshold; when said garbage collection time does exceed said threshold, initiating, with said plurality of monitor components, a message to a decider component to reduce a memory allocation size of a given one of said distributed dataset caches; when said decider component obtains said message to reduce said memory allocation size of said given one of said distributed dataset caches; obtaining, by said decider component, from a directed acyclic graph scheduler component, a current stage directed acyclic graph; with said decider component, determining at least one of said distributed datasets to drop, based on said current stage of said directed acyclic graph, to avoid dropping other, needed ones of said distributed datasets; and initiating, with said decider component, a message to said cache manager to drop said at least one of said distributed datasets; and when said cache manager receives said message to drop said at least one of said distributed datasets, reducing said memory allocation size of said given one of said distributed dataset caches after dropping said at least one of said distributed datasets; wherein determining at least one of said distributed datasets to drop further comprises; identifying, from said current stage directed acyclic graph, a first plurality of said distributed datasets currently stored in said given one of said distributed dataset caches that are being processed by tasks in a current epoch; identifying, from said current stage directed acyclic graph, a second plurality of said distributed datasets currently stored in said given one of said distributed dataset caches that will be used by tasks to be processed in a next epoch; identifying, in said given one of said distributed dataset caches, at least one of said distributed datasets that is not a member of said first or second pluralities of said distributed datasets; and selecting to drop said at least one of said distributed datasets that is not a member of said first or second pluralities of said distributed datasets. - View Dependent Claims (13, 14)
-
Specification