×

Method and system for proactively reducing the outage time of a computer system

  • US 6,978,398 B2
  • Filed: 08/15/2001
  • Issued: 12/20/2005
  • Est. Priority Date: 08/15/2001
  • Status: Active Grant
First Claim
Patent Images

1. A method of reducing a time for a computer system to recover from a degradation of performance in a hardware or a software in at least one first node of said computer system, comprising:

  • monitoring a state of said at least one first node;

    predicting an outage of said hardware or said software based on monitoring;

    based on said monitoring, transferring a state of said at least one first node to a second node prior to said degradation in performance of said hardware or said software of said at least one first node;

    proactively invoking a state migration functionality to reduce said recovery time, wherein said proactively invoking includes migrating a dynamic state to stable storage of said second node, said second node being accessible to a recovering agent, to reduce an amount of time required by said recovering agent; and

    connecting said at least one first node and said second node to a shared memory containing a stale state of the at least one first node and a redo log, wherein said shared memory includes at least one of a shared storage medium, a shared storage disk and a shared network,wherein said degradation of performance comprises one of an outage and a failure,wherein said second node selectively includes an application running corresponding to an application failing on said at least one first node while the at least one first node is still operational,wherein said state transfer from said at least one first node to said second node occurs while the at least one first node is still operational, andwherein said predicting comprises providing a failure predictor on at least one of said at least one first node and said second node, for commanding the at least one first node to start an application if not already running while the at least one first node is still operational, and commanding the second node to begin reading a state of said at least one node and redo log from the shared memory while the at least one first node is still operational.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×