COMPILER-GUIDED SOFTWARE ACCELERATOR FOR ITERATIVE HADOOP JOBS
First Claim
1. A method, comprising:
- identifying intermediate data, generated by an iterative HADOOP application, below a predetermined threshold size and used less than a predetermined threshold time period, the intermediate data being stored in a memory device; and
minimizing input, output, and synchronization overhead for the intermediate data by selectively using at any given time any one of a Message Passing Interface and a HADOOP Distributed File System as a communication layer, the Message Passing Interface being co-located with the HADOOP Distributed File System.
2 Assignments
0 Petitions
Accused Products
Abstract
Various methods are provided directed to a compiler-guided software accelerator for iterative HADOOP jobs. A method includes identifying intermediate data, generated by an iterative HADOOP application, below a predetermined threshold size and used less than a predetermined threshold time period. The intermediate data is stored in a memory device. The method further includes minimizing input, output, and synchronization overhead for the intermediate data by selectively using at any given time any one of a Message Passing Interface and Distributed File System as a communication layer. The Message Passing Interface is co-located with the HADOOP Distributed File System.
32 Citations
18 Claims
-
1. A method, comprising:
-
identifying intermediate data, generated by an iterative HADOOP application, below a predetermined threshold size and used less than a predetermined threshold time period, the intermediate data being stored in a memory device; and minimizing input, output, and synchronization overhead for the intermediate data by selectively using at any given time any one of a Message Passing Interface and a HADOOP Distributed File System as a communication layer, the Message Passing Interface being co-located with the HADOOP Distributed File System. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method, comprising:
-
identifying a set of map tasks and reduce tasks capable of being reused across multiple iterations of an iterative HADOOP application; and reducing a system load imparted on a computer system executing the iterative HADOOP application by transforming a source code of the iterative HADOOP application to launch the map tasks in the set only once and keep the map tasks in the set alive for an entirety of the execution of the iterative HADOOP application. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A method, comprising:
-
automatically transforming an iterative HADOOP application to selectively use at any given time any one of a Message Passing Interface and a HADOOP Distributed File System depending on parameters of a data transfer in the iterative HADOOP application, the Message Passing Interface being co-located with the HADOOP Distributed File System; and enabling concurrent execution by at least one processor of a reduce task from an iteration n and map tasks from an iteration n+1 in the iterative HADOOP application, n being an integer, wherein said enabling step comprises; replacing an invocation to a runJob ( ) function in the iterative HADOOP application by an invocation to a submitJob( ) function; and inserting a function call into the iterative HADOOP application for blocking and reading model data from a Message Passing Interface based data distribution library connected to the Message Passing Interface. - View Dependent Claims (18)
-
Specification