| US 7,516,360 B2 | ||
| System and method for execution of a job in a distributed computing architecture | ||
| Utz Bacher, Tuebingen (Germany); Oliver Benke, Leinfelden-Echterdingen (Germany); Boas Betzler, Magstadt (Germany); Thomas Lumpp, Reutlingen (Germany); and Eberhard Pasch, Tuebingen (Germany) | ||
| Assigned to International Business Machines Corporation, Armonk, N.Y. (US) | ||
| Filed on Sep. 09, 2004, as Appl. No. 10/937,682. | ||
| Claims priority of application No. 03103377 (EP), filed on Sep. 12, 2003. | ||
| Prior Publication US 2005/0081097 A1, Apr. 14, 2005 | ||
| Int. Cl. G06F 11/00 (2006.01); G06F 11/20 (2006.01) | ||
| U.S. Cl. 714—12 [714/15; 714/43] | 17 Claims |

| 16. A method for executing jobs in a distributed computing infrastructure having a distributed management server, worker clients,
and systems selectable as failover systems, wherein said distributed management server gets requests to perform a task, divides
the task into smaller jobs, selects worker clients for each job and sends said jobs to said selected worker clients, said
method at said systems being selectable as failover systems, said method comprising the steps of:
selecting a failover system by at least one worker client;
receiving checkpointing information from said at least one worker client;
monitoring said worker client in order to detect a failure;
taking over and continuing execution of said job by said failover system using said checkpointing information in case of a
failure being detected; and
assigning at least one existing or a newly created failover system to the failover system which is continuing execution of
said job.
|