×

Dynamic memory tuning for in-memory data analytic platforms

  • US 10,204,175 B2
  • Filed: 05/18/2016
  • Issued: 02/12/2019
  • Est. Priority Date: 05/18/2016
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • obtaining, at a cache manager of a directed acyclic graph-based data analytic platform, from each of a plurality of monitor components on a plurality of worker nodes of said directed acyclic graph-based data analytic platform, memory usage statistics for said worker nodes of said directed acyclic graph-based data analytic platform, said worker nodes having a plurality of tasks executing thereon, each of said tasks having at least one distributed dataset associated therewith, each of said worker nodes having a distributed dataset cache, said memory usage statistics comprising garbage collection time;

    initiating, with said plurality of monitor components, a message to a decider component to reduce a memory allocation size of a given one of said distributed dataset caches upon determining, with said plurality of monitor components, that said garbage collection time exceeds a threshold;

    when said decider component obtains said message to reduce said size of said given one of said distributed dataset caches;

    obtaining, by said decider component, from a directed acyclic graph scheduler component, a current stage directed acyclic graph;

    with said decider component, determining at least one of said distributed datasets to drop, based on said current stage of said directed acyclic graph, to avoid dropping other, needed ones of said distributed datasets; and

    initiating, with said decider component, a message to said cache manager to drop said at least one of said distributed datasets; and

    when said cache manager receives said message to drop said at least one of said distributed datasets, reducing said memory allocation size of said given one of said distributed dataset caches after dropping said at least one of said distributed datasets,wherein determining at least one of said distributed datasets to drop further comprises;

    identifying, from said current stage directed acyclic graph, a first plurality of said distributed datasets currently stored in said given one of said distributed dataset caches that are being processed by tasks in a current epoch;

    identifying, from said current stage directed acyclic graph, a second plurality of said distributed datasets currently stored in said given one of said distributed dataset caches that will be used by tasks to be processed in a next epoch;

    identifying, in said given one of said distributed dataset caches, at least one of said distributed datasets that is not a member of said first or second pluralities of said distributed datasets; and

    selecting to drop said at least one of said distributed datasets that is not a member of said first or second pluralities of said distributed datasets.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×