Worldwide distributed job and tasks computational model
First Claim
Patent Images
1. A method comprising:
- receiving a worldwide job configured to be performed on a heterogeneous worldwide distributed file system across administrative boundaries between data sets stored on and associated with a respective distribute file system of a cluster of a plurality of nodes;
splitting, by a worldwide job tracker, the worldwide job into worldwide tasks configured to be performed on respective data sets;
assigning, by the worldwide job tracker, the worldwide tasks to worldwide task trackers at the respective clusters, wherein each worldwide task tracker maintains records of all sub-activities executed as part of the of the worldwide job and edit logs to capture the activities;
submitting the worldwide tasks as jobs from the worldwide task trackers to job trackers at the respective clusters;
splitting, by the job trackers, each of the jobs into tasks each configured to be performed on a portion of the data set stored on the distributed file system of the respective cluster;
assigning the tasks to task trackers at the respective clusters; and
performing parallel processing of the worldwide job on the heterogeneous worldwide distributed file system by executing the tasks across nodes of the respective clusters, wherein the world wide task tracker is a slave of the world wide job tracker, and further wherein the world wide task tracker is a master of the job tracker.
9 Assignments
0 Petitions
Accused Products
Abstract
Example embodiments of the present invention relate to a method, an apparatus, and a computer program product for performing file system activities across administrative boundaries between a plurality of file systems. The method includes receiving a worldwide job to perform on a plurality of file systems, managing the worldwide job, and receiving results of the worldwide job from the plurality of file systems.
75 Citations
20 Claims
-
1. A method comprising:
-
receiving a worldwide job configured to be performed on a heterogeneous worldwide distributed file system across administrative boundaries between data sets stored on and associated with a respective distribute file system of a cluster of a plurality of nodes; splitting, by a worldwide job tracker, the worldwide job into worldwide tasks configured to be performed on respective data sets; assigning, by the worldwide job tracker, the worldwide tasks to worldwide task trackers at the respective clusters, wherein each worldwide task tracker maintains records of all sub-activities executed as part of the of the worldwide job and edit logs to capture the activities; submitting the worldwide tasks as jobs from the worldwide task trackers to job trackers at the respective clusters; splitting, by the job trackers, each of the jobs into tasks each configured to be performed on a portion of the data set stored on the distributed file system of the respective cluster; assigning the tasks to task trackers at the respective clusters; and performing parallel processing of the worldwide job on the heterogeneous worldwide distributed file system by executing the tasks across nodes of the respective clusters, wherein the world wide task tracker is a slave of the world wide job tracker, and further wherein the world wide task tracker is a master of the job tracker. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system comprising:
-
one or more processors; and memory storing computer program code that when executed on one or more of the one or more processors causes the system to perform the operation of; receiving a worldwide job configured to be performed on a heterogenous worldwide distributed file system across administrative boundaries between data sets stored on and associated with a respective distribute file system of a cluster of a plurality of nodes; splitting, by a worldwide job tracker, the worldwide job into worldwide tasks configured to be performed on respective data sets; assigning, by the worldwide job tracker, the worldwide tasks to worldwide task trackers at the respective clusters, wherein each worldwide task tracker maintains records of all sub-activities executed as part of the of the worldwide job and edit logs to capture the activities; and submitting the worldwide tasks as jobs from the worldwide task trackers to job trackers at the respective clusters; splitting, by the job trackers, each of the jobs into tasks each configured to be performed on a portion of the data set stored on the distributed file system of the respective cluster; assigning the tasks to task trackers at the respective clusters; and performing parallel processing of the worldwide job on the heterogeneous worldwide distributed file system by executing the tasks across nodes of the respective clusters, wherein the world wide task tracker is a slave of the world wide job tracker, and further wherein the world wide task tracker is a master of the job tracker. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer program product having a non-transitory computer readable storage medium with instructions encoded thereon that, when executed by a processor of a computer, causes the computer to perform the operations of:
-
receiving a worldwide job configured to be performed on a heterogeneous worldwide distributed file system across administrative boundaries between data sets stored on and associated with a respective distribute file system of a cluster of a plurality of nodes; splitting, by a worldwide job tracker, the worldwide job into worldwide tasks configured to be performed on respective data sets; assigning, by the worldwide job tracker, the worldwide tasks to worldwide task trackers at the respective clusters, wherein each worldwide task tracker maintains records of all sub-activities executed as part of the of the worldwide job and edit logs to capture the activities; submitting the worldwide tasks as jobs from the worldwide task trackers to job trackers at the respective clusters; splitting, by the job trackers, each of the jobs into tasks each configured to be performed on a portion of the data set stored on the distributed file system of the respective cluster; assigning the tasks to task trackers at the respective clusters; and performing parallel processing of the worldwide job on the heterogeneous worldwide distributed file system by executing the tasks across nodes of the respective clusters, wherein the world wide task tracker is a slave of the world wide job tracker, and further wherein the world wide task tracker is a master of the job tracker. - View Dependent Claims (18, 19, 20)
-
Specification