Enabling dynamic job configuration in mapreduce
First Claim
Patent Images
1. A method comprising:
- generating a first set of configurations for a currently executing MapReduce job, wherein said set of configurations comprises job-level configurations and task-level configurations;
dynamically modifying;
(i) a distinct set of task-level configurations of a mapper component associated with at least one ongoing map task of the MapReduce job, based on the generated first set of configurations, wherein said task-level configurations of the mapper component comprise a size of a map task input, resource allocation for a mapper component, central processing unit, memory, a size of a sorting area, and a number of threads when writing a map output to a local disk, and (ii) a distinct set of task-level configurations of a reducer component associated with at least one ongoing reduce task of the MapReduce job, based on the generated first set of configurations, wherein said task-level configurations of the reducer component comprise a number of threads when copying a map output to a reducer component, a size of a reduce task input, resource allocation of a reduce task, central processing unit, memory, and a size of a sorting area; and
deploying said modified configurations to the mapper component and the reducer component associated with the MapReduce job in accordance with one of multiple temporal deployment schedules, wherein the temporal deployment schedule is based on the contents of the modified configurations;
wherein said generating, said modifying, and said deploying are carried out by one or more computing devices.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods, systems, and articles of manufacture for enabling dynamic task-level configuration in MapReduce are provided herein. A method includes generating a first set of configurations for a currently executing MapReduce job, wherein said set of configurations comprises job-level configurations and task-level configurations; dynamically modifying configurations associated with a mapper component and/or a reducer component associated with at least one ongoing map task and/or ongoing reduce task of the MapReduce job based on the generated first set of configurations; and deploying said first set of configurations to the mapper component and/or the reducer component associated with the MapReduce job.
24 Citations
18 Claims
-
1. A method comprising:
-
generating a first set of configurations for a currently executing MapReduce job, wherein said set of configurations comprises job-level configurations and task-level configurations; dynamically modifying;
(i) a distinct set of task-level configurations of a mapper component associated with at least one ongoing map task of the MapReduce job, based on the generated first set of configurations, wherein said task-level configurations of the mapper component comprise a size of a map task input, resource allocation for a mapper component, central processing unit, memory, a size of a sorting area, and a number of threads when writing a map output to a local disk, and (ii) a distinct set of task-level configurations of a reducer component associated with at least one ongoing reduce task of the MapReduce job, based on the generated first set of configurations, wherein said task-level configurations of the reducer component comprise a number of threads when copying a map output to a reducer component, a size of a reduce task input, resource allocation of a reduce task, central processing unit, memory, and a size of a sorting area; anddeploying said modified configurations to the mapper component and the reducer component associated with the MapReduce job in accordance with one of multiple temporal deployment schedules, wherein the temporal deployment schedule is based on the contents of the modified configurations; wherein said generating, said modifying, and said deploying are carried out by one or more computing devices. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. The method of 1, wherein said deploying comprises deploying said modified configurations based on a task execution status associated with the mapper component and/or the reducer component.
-
13. An article of manufacture comprising a non-transitory computer readable storage medium having computer readable instructions tangibly embodied thereon which, when implemented, cause a computer to carry out a plurality of method steps comprising:
-
generating a first set of configurations for a currently executing MapReduce job, wherein said set of configurations comprises job-level configurations and task-level configurations; dynamically modifying;
(i) a distinct set of task-level configurations of a mapper component associated with at least one ongoing map task of the MapReduce job, based on the generated first set of configurations, wherein said task-level configurations of the mapper component comprise a size of a map task input, resource allocation for a mapper component, central processing unit, memory, a size of a sorting area, and a number of threads when writing a map output to a local disk, and (ii) a distinct set of task-level configurations of a reducer component associated with at least one ongoing reduce task of the MapReduce job, based on the generated first set of configurations, wherein said task-level configurations of the reducer component comprise a number of threads when copying a map output to a reducer component, a size of a reduce task input, resource allocation of a reduce task, central processing unit, memory, and a size of a sorting area; anddeploying said modified configurations to the mapper component and the reducer component associated with the MapReduce job in accordance with one of multiple temporal deployment schedules, wherein the temporal deployment schedule is based on the contents of the modified configurations. - View Dependent Claims (14, 15, 16, 17)
-
-
18. A system comprising:
-
at least one memory; and at least one processor coupled to the at least one memory and configured for; generating a first set of configurations for a currently executing MapReduce job, wherein said set of configurations comprises job-level configurations and task-level configurations; dynamically modifying;
(i) a distinct set of task-level configurations of a mapper component associated with at least one ongoing map task of the MapReduce job, based on the generated first set of configurations, wherein said task-level configurations of the mapper component comprise a size of a map task input, resource allocation for a mapper component, central processing unit, memory, a size of a sorting area, and a number of threads when writing a map output to a local disk, and (ii) a distinct set of task-level configurations of a reducer component associated with at least one ongoing reduce task of the MapReduce job, based on the generated first set of configurations, wherein said task-level configurations of the reducer component comprise a number of threads when copying a map output to a reducer component, a size of a reduce task input, resource allocation of a reduce task, central processing unit, memory, and a size of a sorting area; anddeploying said modified configurations to the mapper component and the reducer component associated with the MapReduce job in accordance with one of multiple temporal deployment schedules, wherein the temporal deployment schedule is based on the contents of the modified configurations.
-
Specification