×

Dynamic cache management for in-memory data analytic platforms

  • US 10,467,152 B2
  • Filed: 05/18/2016
  • Issued: 11/05/2019
  • Est. Priority Date: 05/18/2016
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • obtaining, at a cache manager of a directed acyclic graph-based data analytic platform, from each of a plurality of monitor components on a plurality of worker nodes of said directed acyclic graph-based data analytic platform, statistics for a plurality of tasks executing on said worker nodes, said statistics comprising which of said tasks have been processed and which are in a task queue, each of said tasks having at least one distributed dataset associated therewith, each of said worker nodes having a distributed dataset cache;

    obtaining, at said cache manager, from a directed acyclic graph scheduler component of said directed acyclic graph-based data analytic platform, a current stage directed acyclic graph;

    for a given one of said tasks which has been processed, and for which, based on said current stage directed acyclic graph, it is determined that no other ones of said tasks depend on said at least one distributed dataset associated with said given one of said tasks, evicting said distributed dataset associated with said given one of said tasks from a corresponding one of said distributed dataset caches;

    monitoring, with said monitor components, memory usage statistics for said worker nodes of said directed acyclic graph-based data analytic platform; and

    increasing a size of a given resilient distributed dataset cache of a plurality of resilient distributed dataset caches if said memory usage statistics indicate that corresponding ones of said tasks are using too little memory, said distributed dataset caches comprising said resilient distributed dataset caches, said memory usage statistics comprising garbage collection time.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×