System and method for execution of a job in a distributed computing architecture
First Claim
1. A data processing method implemental by a worker client in a distributed computing architecture having a designated computer for splitting processing tasks into smaller jobs, a computer network for transmitting each of said jobs to one of a plurality of worker clients in order to execute said assigned jobs, each of said worker clients having:
- a checkpointing component for generating checkpointing information assigned to at least one of said worker clients;
at least one failover system assigned to said at least one worker client;
a failover system selection component for automatically assigning at least one existing or newly created failover system to said failure system being assigned at least to said one worker client in the case that one of said worker clients fails;
wherein said assigned failover system provides all function components in order to take over the execution of the job when said assigned worker client fails; and
wherein said assigned failover system further includes at least a failover monitor component for detecting failover situations of said assigned worker client.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention provides a system and method for the execution of jobs in a distributed computing architecture that uses worker clients which are characterized by a checkpointing mechanism component for generating checkpointing information being assigned to at least one worker client, at least one failover system being assigned to the worker client, a component (failover system selection component) for automatically assigning at least one existing or newly created failover system to the failure system being assigned to a worker client in the case said worker clients fails, wherein the assigned failover system provides all function components in order to take over the execution of the job when said assigned worker client fails, wherein the assigned failover system further includes at least a failover monitor component for detecting failover situations of said assigned worker client.
52 Citations
17 Claims
-
1. A data processing method implemental by a worker client in a distributed computing architecture having a designated computer for splitting processing tasks into smaller jobs, a computer network for transmitting each of said jobs to one of a plurality of worker clients in order to execute said assigned jobs, each of said worker clients having:
-
a checkpointing component for generating checkpointing information assigned to at least one of said worker clients;
at least one failover system assigned to said at least one worker client;
a failover system selection component for automatically assigning at least one existing or newly created failover system to said failure system being assigned at least to said one worker client in the case that one of said worker clients fails;
wherein said assigned failover system provides all function components in order to take over the execution of the job when said assigned worker client fails; and
wherein said assigned failover system further includes at least a failover monitor component for detecting failover situations of said assigned worker client. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A data processing method for implementing a failover system in a distributed computing infrastructure having a designated computer for splitting processing tasks into smaller jobs, a computer network for transmitting each of said jobs to one of a plurality of worker clients in order to execute said assigned jobs, said method comprising the step of:
assigning said failover system to at least a specific worker client having assigned a checkpointing mechanism component for generating checkpointing information of at least of said worker client, and a failover system selection component for automatically assigning at least one existing or newly created failover system to said failure system being assigned said worker client in the case said worker clients fails, wherein said failover system further includes all function components in order to take over the execution of the job when said assigned worker client fails, and at least a failover monitor component for detecting failover situations of said assigned worker client. - View Dependent Claims (9, 10, 11, 12)
-
13. A distributed computing infrastructure having a distributed management server for receiving processing tasks, splitting them into smaller jobs, and selecting worker clients for execution of said jobs, comprising:
-
a plurality of worker clients;
a computer network for transmitting each of said jobs to one of a plurality of worker clients in order to execute said assigned jobs;
wherein at least one of said worker clients includes;
a checkpointing component for generating checkpointing information, wherein said checkpointing component is assigned to at least one of said worker clients;
at least one failover system assigned to said worker client, wherein said assigned failover system provides all function components in order to take over the execution of the job when said assigned worker client fails;
a failover system selection component for automatically assigning at least one existing or newly created failover system to said failure system in the case that said worker client is not assigned to each worker client; and
wherein said assigned failover system further includes at least a failover monitor component for detecting failover situations of said assigned worker client.
-
-
14. A method for executing work jobs in a distributed computing infrastructure having a distributed management server and worker clients, wherein said distributed management server gets requests to perform a task, divides the task into smaller jobs, selects worker clients for each job and sends said jobs to said selected worker clients, wherein the method at said worker client comprises the steps of:
-
determining at least one assigned failover system for said worker client executing a job;
providing checkpointing information generated by said worker client to said failover system; and
monitoring of said worker client in order to detect a failover, wherein said failover system takes over and continues execution of said job, and automatically assigns an existing or a newly created failover system to said failover in the case said worker client fails. - View Dependent Claims (16)
-
-
15. A method for executing jobs in a distributed computing infrastructure having a distributed management server, worker clients, and systems selectable as failover systems, wherein said distributed management server gets requests to perform a task, divides the task into smaller jobs, selects worker clients for each job and sends said jobs to said selected worker clients, said method at said systems being selectable as failover systems, said method comprising the steps of:
-
allowing selection as failover system by worker client;
receiving checkpointing information from said assigned worker client;
monitoring of said assigned worker client in order to detect a failure;
taking over and continuing execution of said job by said assigned failover system by using said checkpointing information in the case a failure is detected; and
assigning at least one existing or a newly created failover system to said failover system continuing execution of said job. - View Dependent Claims (17)
-
Specification