×

Cache management for map-reduce applications

  • US 10,078,594 B2
  • Filed: 08/18/2015
  • Issued: 09/18/2018
  • Est. Priority Date: 08/29/2014
  • Status: Active Grant
First Claim
Patent Images

1. A method for optimizing a cache on a computing node for a MapReduce application on a distributed file system, the method comprising:

  • training a first machine learning model to determine an optimal cache slice size on the computing node for processing a map request in a shortest processing time based on first parameters in historical records for previously executed map tasks on the computing node, the first parameters including a first total data size to be processed, a first size of each data record, and a number of map tasks that will execute simultaneously on the computing node;

    receiving, by a computer, the map request for the MapReduce application on the distributed file system that includes one or more storage medium connected to the computing node;

    receiving, by the computer, first parameters for processing the map request;

    determining, by the trained first machine learning model, the optimal cache slice size for the computing node for processing the map request corresponding to the shortest processing time of the map request, wherein the optimal cache slice size is determined based on the received first parameters for processing the map request;

    reading, by the computing node, based on the determined optimal cache slice size, data from the one or more storage medium of the distributed file system into the cache of the computing node;

    processing, by the computing node, the map request; and

    writing, by the computing node, a final result data of the map request processing to the one or more storage medium.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×