×

Method and system for automatic failover of distributed query processing using distributed shared memory

  • US 8,874,961 B2
  • Filed: 05/28/2010
  • Issued: 10/28/2014
  • Est. Priority Date: 03/22/2010
  • Status: Active Grant
First Claim
Patent Images

1. A method for implementing automatic recovery from failure of resources in a grid-based distributed database, the grid comprising a plurality of multi-cast subgroup of nodes, wherein each subgroup of nodes comprises one or more worker nodes and one or more idle nodes, the method comprising:

  • determining a category for each node in the subgroup of nodes, wherein each node is categorized as a worker node or an idle node;

    saving a set of data structures corresponding to each step involved in execution of a task assigned to each worker node at pre-determined time intervals in a shared memory distributed across the sub-group of nodes, wherein the saved set of data structures is assigned a first value;

    monitoring each worker node by the one or more idle nodes in each sub-group by polling the shared memory for detecting changes to the saved set of data structures corresponding to each worker node at pre-determined time intervals, wherein a second value is assigned to the saved set of data structures corresponding to each worker node if a change is detected;

    raising a failure notification by the one or more idle nodes if the second value is not detected after a pre-determined period of time for at least one of the worker nodes, wherein the detection of no change from the first value to the second value indicates interruption in the execution of at least one of the steps involved in the execution of the task, thereby the saved set of data structures acts as a checkpoint for determining the saved set of data structures that are to be reloaded to the one or more idle nodes;

    reloading the saved set of data structures corresponding to each step involved in execution of the task of the at least one worker node to at least one of the one or more idle nodes after the detection of no change from the first value to the second value; and

    resuming, by the idle node, execution from the interrupted step involved in the execution of the task.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×