×

Methods and apparatus for resource management in cluster computing

  • US 8,640,137 B1
  • Filed: 08/30/2010
  • Issued: 01/28/2014
  • Est. Priority Date: 08/30/2010
  • Status: Active Grant
First Claim
Patent Images

1. A method for tracking jobs performed by computing nodes of a cluster computing system, the method comprising:

  • transmitting, by a job scheduling system, a job description via a data network to a resource tracking system for the cluster computing system, wherein the job description specifies a number of resources for performing a job comprising a plurality of tasks, wherein the resource tracking system maintains a resource list comprising, for each resource of a plurality of resources, a respective availability of the resource and a respective network location identifying a respective computing node in the cluster computing system at which the resource is located;

    determining, by the resource tracking system, that a subset of resources from the plurality of resources is available in response to receiving the job description, wherein an availability of the subset of resources is determined by reference to the resource list;

    transmitting, by the resource tracking system, at least one network identifier via the data network to the job scheduling system, wherein the at least one network identifier identifies at least one computing node at which the subset of resources are located;

    generating, by the job scheduling system, a job state object for tracking a job status for the job;

    transmitting, by the job scheduling system, the job state object and the job to the at least one computing node;

    updating, by the at least one computing node, the job state object to describe an updated job status subsequent to performing at least one task of the plurality of tasks;

    transmitting, by the at least one computing node, job metadata extracted from the job state object in response to a job status query from the job scheduling system, wherein the job metadata indicates the updated job status; and

    transmitting, by the at least one computing node, the updated job state object to at least one additional computing node for performing at least one additional task associated with the job;

    wherein the at least one additional computing node updates the job state object subsequent to performing the at least one additional task and notifies the job scheduling system of the update to the job state object by the at least one additional computing node.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×