CACHE MANAGEMENT FOR MAP-REDUCE APPLICATIONS
First Claim
1. A method for managing a cache for a MapReduce application on a distributed file system, the method comprising:
- receiving, by a computer, a map request for a MapReduce application on a distributed file system that includes one or more storage medium;
receiving, by the computer, parameters for processing the map request, the parameters including a total data size to be processed, a size of each data record, and a number of map requests executing simultaneously;
determining, by the computer, a cache size for processing the map request, wherein the cache size is determined based on the received parameters for processing the map request and a machine learning model for a map request cache size;
reading, by the computer, based on the determined cache size, data from the one or more storage medium of the distributed file system into the cache;
processing, by the computer, the map request; and
writing, by the computer, an intermediate result data of the map request processing into the cache, based on the determined cache size.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer manages a cache for a MapReduce application based on a distributed file system that includes one or more storage medium by receiving a map request and receiving parameters for processing the map request. The parameters include a total data size to be processed, a size of each data record, and a number of map requests executing simultaneously. The computer determines a cache size for processing the map request, wherein the cache size is determined based on the received parameters for processing the map request and a machine learning model for a map request cache size and reads, based on the determined cache size, data from the one or more storage medium of the distributed file system into the cache. The computer processes the map request and writes an intermediate result data of the map request processing into the cache, based on the determined cache size.
35 Citations
24 Claims
-
1. A method for managing a cache for a MapReduce application on a distributed file system, the method comprising:
-
receiving, by a computer, a map request for a MapReduce application on a distributed file system that includes one or more storage medium; receiving, by the computer, parameters for processing the map request, the parameters including a total data size to be processed, a size of each data record, and a number of map requests executing simultaneously; determining, by the computer, a cache size for processing the map request, wherein the cache size is determined based on the received parameters for processing the map request and a machine learning model for a map request cache size; reading, by the computer, based on the determined cache size, data from the one or more storage medium of the distributed file system into the cache; processing, by the computer, the map request; and writing, by the computer, an intermediate result data of the map request processing into the cache, based on the determined cache size. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer program product for managing a cache for a MapReduce application on a distributed file system, the computer program product comprising one or more computer readable storage medium and program instructions stored on at least one of the one or more computer readable storage medium, the program instructions comprising:
-
program instructions to receive, by a computer, a map request for a MapReduce application on a distributed file system that includes one or more storage medium; program instructions to receive, by the computer, parameters for processing the map request, the parameters including a total data size to be processed, a size of each data record, and a number of map requests executing simultaneously; program instructions to determine, by the computer, a cache size for processing the map request, wherein the cache size is determined based on the received parameters for processing the map request and a machine learning model for a map request cache size; program instructions to read, by the computer, based on the determined cache size, data from the one or more storage medium of the distributed file system into the cache; program instructions to process, by the computer, the map request; and program instructions to write, by the computer, an intermediate result data of the map request processing into the cache, based on the determined cache size. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer system for managing a cache for a MapReduce application on a distributed file system, the computer system comprising one or more processors, one or more computer readable memories, one or more computer readable tangible storage medium, and program instructions stored on at least one of the one or more storage medium for execution by at least one of the one or more processors via at least one of the one or more memories, the program instructions comprising:
-
program instructions to receive, by a computer, a map request for a MapReduce application on a distributed file system that includes one or more storage medium; program instructions to receive, by the computer, parameters for processing the map request, the parameters including a total data size to be processed, a size of each data record, and a number of map requests executing simultaneously; program instructions to determine, by the computer, a cache size for processing the map request, wherein the cache size is determined based on the received parameters for processing the map request and a machine learning model for a map request cache size; program instructions to read, by the computer, based on the determined cache size, data from the one or more storage medium of the distributed file system into the cache; program instructions to process, by the computer, the map request; and program instructions to write, by the computer, an intermediate result data of the map request processing into the cache, based on the determined cache size. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
-
Specification