Methods and Apparatus for Job State Tracking in Cluster Computing
First Claim
1. A method, comprising:
- receiving, by a first compute node of a plurality of compute nodes on a cluster computing system, a state object for a job to be executed by a distributed application on the cluster computing system, wherein the state object includes a job metadata database that stores status and history information for the respective job as executed by the distributed application on the plurality of compute nodes of the cluster computing system, wherein the cluster computing system comprises at least the plurality of compute nodes connected together through a network;
sending, by the first compute node through the network, a location indicator for the state object on the first compute node to a state tracker at a second compute node of the plurality of compute nodes on the cluster computing system, the state tracker recording that the state object is located at the first compute node in a state object location database for the cluster computing system; and
updating the job metadata database of the state object according to execution of one or more tasks of the job on the first compute node.
2 Assignments
0 Petitions
Accused Products
Abstract
Embodiments of a state tracking technique may enable real-time tracking of jobs in a computer cluster. A state object is provided that allows a job to be implemented as a distributable database. The job may be tracked while the job is processing via the state tracking technique. Using the state tracking technique, the cluster may track the location of the state objects for jobs in a database. However, only location information for the state object, and not the job metadata itself, is stored in the central database. This reduces the amount of data stored in the central database, distributing the metadata across the cluster, thus improving database performance and reducing bandwidth requirements on the network. Information about a job may be acquired via a query to the central database to find the location of the respective state object, and then a query to the state object (or to a proxy).
-
Citations
21 Claims
-
1. A method, comprising:
-
receiving, by a first compute node of a plurality of compute nodes on a cluster computing system, a state object for a job to be executed by a distributed application on the cluster computing system, wherein the state object includes a job metadata database that stores status and history information for the respective job as executed by the distributed application on the plurality of compute nodes of the cluster computing system, wherein the cluster computing system comprises at least the plurality of compute nodes connected together through a network; sending, by the first compute node through the network, a location indicator for the state object on the first compute node to a state tracker at a second compute node of the plurality of compute nodes on the cluster computing system, the state tracker recording that the state object is located at the first compute node in a state object location database for the cluster computing system; and updating the job metadata database of the state object according to execution of one or more tasks of the job on the first compute node. - View Dependent Claims (2, 3, 4, 5, 6, 21)
-
-
7. A cluster computing system, comprising:
-
a plurality of compute nodes coupled to a network; a node coupled to the network that implements a state tracker; and a storage that stores a state object location database for the cluster computing system; wherein, during operation, a first compute node of the a plurality of compute nodes; receives a state object for a job to be executed by a distributed application on the cluster computing system, wherein the state object includes a job metadata database that stores status and history information for the respective job as executed by the distributed application on the cluster computing system; sends a location indicator for the state object on the first compute node to the state tracker; and updates the job metadata database of the state object according to execution of one or more tasks of the job on the first compute node; wherein, during operation, the state tracker; receives the location indicator for the state object from the first compute node; and records the location indicator for the state object in the state object location database. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A non-transitory computer-readable storage medium storing program instructions, wherein the program instructions are computer-executable to implement:
-
receiving, by a first compute node of a plurality of compute nodes on a cluster computing system, a state object for a job to be executed by a distributed application on the cluster computing system, wherein the state object includes a job metadata database that stores status and history information for the respective job as executed by the distributed application on the cluster computing system; sending, by the state object, a location indicator for the state object on the first compute node to a state tracker on a second compute node of the cluster computing system; the state tracker recording the location indicator for the state object in a state object location database for the cluster computing system; and updating the job metadata database of the state object according to execution of one or more tasks of the job on the first compute node. - View Dependent Claims (14, 15, 16, 17, 18)
-
-
19. A node of a cluster computing system, comprising:
-
at least one processor; and a memory comprising program instructions, wherein the program instructions are executable by the at least one processor to; receive, from a first compute node of a plurality of compute nodes on the cluster computing system, a location indicator for a state object for a job to be executed by a distributed application on the cluster computing system, wherein the state object includes a job metadata database that stores status and history information for the respective job˜
executed by the distributed application on the cluster computing system, and wherein the state object is configured to be transferred between compute nodes on the cluster computing system during execution of the job by the distributed application;record the location indicator for the state object as the current location of the state object in a state object location database for the cluster computing system; and respond to queries requesting the current location of the state object according to the current location of the state object in the state object location database. - View Dependent Claims (20)
-
Specification