×

Distributed Job Manager Recovery

  • US 20080307258A1
  • Filed: 06/11/2007
  • Published: 12/11/2008
  • Est. Priority Date: 06/11/2007
  • Status: Active Grant
First Claim
Patent Images

1. A method for failure recovery of a job manager within a data processing system, the method comprising:

  • instantiating a job manager on one of a plurality of nodes to manage distributed execution of jobs on the plurality of nodes, wherein management of job execution for each job comprises managing the deployment and execution of a plurality of processing elements associated with each job on the plurality of nodes in accordance with instructions from a job scheduler;

    checkpointing a current state associated with each one of the plurality of processing elements associated with the jobs managed by the job manager; and

    recovering the job manager subsequent to a failure of the job manager using the checkpointed processing element states.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×