System for partitioning batch processes
First Claim
1. A system for processing a batch job, comprising:
- a processor configured to;
receive a job name for a job submitted to execute;
receive one or more job parameters;
determine one or more nodes to run the job;
determine at least two steps;
determine a static step of the at least two steps, wherein the static step is not parallelizable by a partitioning of inputs, wherein each partition of the inputs is processed in parallel;
execute the static step, wherein an output of the static step is a set of objects;
determine a subset of the set of objects; and
execute a subsequent step of the at least two steps on the subset of the set of objects on a node of the one or more nodes,wherein in the event two or more nodes are determined to run the job and the set of objects comprises two or more subsets, the subsequent step is executed on the two or more subsets in parallel,wherein a step of the at least two steps is executed using a state of data associated with a start state of the step;
upon completion of executing the step, store a result to a durable storage, wherein the durable storage stores a state of data associated with a completion state of the step, and wherein the state of data associated with the start state of the step and the completion state of the step are accessible by other execution processes as associated with either the start state of the step or the completion state of the step;
a memory coupled with the processor, wherein the memory is configured to provide the processor with instructions.
0 Assignments
0 Petitions
Accused Products
Abstract
A system for processing a batch job comprises a processor and a memory. The processor is configured to receive a job name for a job submitted to execute, to receive one or more job parameters, and to determine one or more nodes to run the job. The processor is configured to determine one or steps, where for each step: a step is executed on a node using a state of data associated with a start state of the step; and upon completion of executing the step, a result is stored to a durable storage. The durable storage stores the state of data associated with the start state of the step and the completion state of the step and are accessible by other execution processes as associated with either the start state of the step or the completion state of the step. The memory of the system is coupled to the processor and configured to provide processor with instructions.
9 Citations
21 Claims
-
1. A system for processing a batch job, comprising:
-
a processor configured to; receive a job name for a job submitted to execute; receive one or more job parameters; determine one or more nodes to run the job; determine at least two steps; determine a static step of the at least two steps, wherein the static step is not parallelizable by a partitioning of inputs, wherein each partition of the inputs is processed in parallel; execute the static step, wherein an output of the static step is a set of objects; determine a subset of the set of objects; and execute a subsequent step of the at least two steps on the subset of the set of objects on a node of the one or more nodes, wherein in the event two or more nodes are determined to run the job and the set of objects comprises two or more subsets, the subsequent step is executed on the two or more subsets in parallel, wherein a step of the at least two steps is executed using a state of data associated with a start state of the step; upon completion of executing the step, store a result to a durable storage, wherein the durable storage stores a state of data associated with a completion state of the step, and wherein the state of data associated with the start state of the step and the completion state of the step are accessible by other execution processes as associated with either the start state of the step or the completion state of the step; a memory coupled with the processor, wherein the memory is configured to provide the processor with instructions. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for executing a job, comprising:
-
receiving a job name for a job submitted to execute; receiving one or more job parameters; determining one or more nodes to run the job; determining at least two steps; determine a static step of the at least two steps, wherein the static step is not parallelizable by a partitioning of inputs, wherein each partition of the inputs is processed in parallel; executing the static step, wherein an output of the static step is a set of objects; determining a subset of the set of objects; and executing a subsequent step of the at least two steps on the subset of the set of objects on a node of the one or more nodes, wherein in the event two or more nodes are determined to run the job and the set of objects comprises two or more subsets, the subsequent step is executed on the two or more subsets in parallel, wherein a step of the at least two steps-is executed using a state of data associated with a start state of the step; upon completion of executing the step, storing a result to a durable storage, wherein the durable storage stores a state of data associated with a completion state of the step, and wherein the state of data associated with the start state of the step and the completion state of the step are accessible by other execution processes as associated with either the start state of the step or the completion state of the step. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer program product for executing a job, the computer program product being embodied in a computer readable non-transitory medium and comprising computer instructions for:
-
receiving a job name for a job submitted to execute; receiving one or more job parameters; determining one or more nodes to run the job; determining at least two steps; determine a static step of the at least two steps, wherein the static step is not parallelizable by a partitioning of inputs, wherein each partition of the inputs is processed in parallel; executing the static step, wherein an output of the static step is a set of objects; determining a subset of the set of objects; and executing a subsequent step of the at least two steps on the subset of the set of objects on a node of the one or more nodes, wherein in the event two or more nodes are determined to run the job and the set of objects comprises two or more subsets, the subsequent step is executed on the two or more subsets in parallel, wherein a step of the at least two steps is executed using a state of data associated with a start state of the step; upon completion of executing the step, storing a result to a durable storage, wherein the durable storage stores a state of data associated with a completion state of the step, and wherein the state of data associated with the start state of the step and the completion state of the step are accessible by other execution processes as associated with either the start state of the step or the completion state of the step. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification