Method and system for transparent time-based selective software rejuvenation

US 6,594,784 B1
Filed: 11/17/1999
Issued: 07/15/2003
Est. Priority Date: 11/17/1999
Status: Expired due to Term

First Claim

Patent Images

1. A method of enhancing software dependability, comprising:

measuring a time elapsed in a software system running on a computer;

determining whether said elapsed time matches a threshold; and

when said elapsed time matches said threshold, rejuvenating at least a portion of said software system to reduce a likelihood of an outage and without modifying an application running in said software system.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of enhancing software dependability, includes measuring an elapsed time in a software system running on a computer, determining whether the elapsed time matches a threshold, and when the elapsed time matches the threshold, rejuvenating at least a portion of the software system to reduce the likelihood of an outage and without modifying an application running in the software system.

Citations

28 Claims

1. A method of enhancing software dependability, comprising:
- measuring a time elapsed in a software system running on a computer;
  
  determining whether said elapsed time matches a threshold; and
  
  when said elapsed time matches said threshold, rejuvenating at least a portion of said software system to reduce a likelihood of an outage and without modifying an application running in said software system.

2. A method for software rejuvenation, comprising:
- waiting for a selected inter-rejuvenation interval to expire in a software system;
  
  determining whether a fail-to node has adequate resources to accept a failover workload;
  
  if said determining determines tat the fail-to node cannot accept the failover workload, then sending an alert that adequate resources do not exist to support fault tolerance requirements;
  
  suspending rejuvenation until an operator acknowledges and corrects the deficiency; and
  
  rejuvenating said software without modifying an application running in said software system.
- View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 3. The method according to claim 2, further comprising:
4. The method according to claim 2, further comprising:
- if the determining determines the fail-to node can accept the failover workload, then a rejuvenation agent on a node instructing a cluster manager to shut down an open application in a pre-planned manner on the node; and
  
  subsequently restarting the application on said node.
5. The method according to claim 2, wherein said software rejuvenation is performed at an application software level.
6. The method according to claim 3, wherein said first node comprises a primary node and said second node comprises a secondary node, said method further comprising:
- designating, by the cluster manager, the secondary node as a new primary node, and the primary node as a new secondary node.
7. The method according to claim 2, wherein said rejuvenation is performed in a clustered environment.
8. The method according to claim 2, wherein said rejuvenation is devoid of changing an application running on said system.
9. The method according to claim 2, further comprising:
- automatically performing selective software rejuvenation, on a periodic basis, without operator intervention, and at a time which is deemed least disruptive to system operation.
10. The method according to claim 9, wherein said rejuvenation is performed based on one of a time elapsed since a last rejuvenation, and said system having completed a particular workload.
11. The method according to claim 10, wherein said rejuvenation is performed for one of a portion of said system and an entirety of said system.
12. The method according to claim 2, wherein said rejuvenation is performed transparently to an application program running on said system, such that no changes to an application software of said software system are required.
13. The method according to claim 2, wherein said rejuvenation is invoked within a cluster environment, andwherein cluster management failover services are used to controllably terminate one of an offending subsystem and an application software, and to restart said one of said subsystem and application software on a same or another node in the cluster.
14. The method according to claim 2, further comprising:
- prior to invoking rejuvenation in the cluster, checking a fail-to node of the cluster to confirm whether said fail-to node has adequate resources to accept the failed-over workload.
15. The method according to claim 14, further comprising:
- if the resource check fails, then informing a system operator that the failover cannot occur, and alerting the operator of the system'"'"'s inability to perform rejuvenation.
16. The method according to claim 15, wherein said operator takes corrective action to restore the system'"'"'s fault resilience by at least one of adding processors, adding memory, adding input/output (I/O) devices, adding storage, and rejuvenating the fail-to node to free up resources consumed by aging on the fail-to node.
17. The method according to claim 2, wherein said rejuvenation is performed, transparently to an application software of said system, based on measuring elapsed time, and by signaling to one of an operator and cluster management software to perform a planned rejuvenation.
18. The method according to claim 2, further comprising:
- scheduling said rejuvenation to occur at a time of least system workload.
19. The method according to claim 2, further comprising:
- selectively rejuvenating said system such that only that part of the system that is causing aging is rejuvenated.
20. The method according to claim 2, further comprising:
- performing said rejuvenation without modifying an application software of said software system.

21. A method for software rejuvenation, comprising:
- waiting for a selected inter-rejuvenation interval to expire in a software system;
  
  determining whether a fail-to node has adequate resources to accept a failover workload;
  
  if said determining determines that the fail-to node can accept the failover workload, then a rejuvenation agent on a primary node instructing a cluster manager to shut down an open application in a pre-planned manner on the primary node without modifying an application running in said software system; and
  
  restarting the application on one of the primary node and a secondary node.
- View Dependent Claims (22)
- - 22. The method according to claim 21, further comprising:

23. A system for increasing software dependability, comprising:
- a timer for measuring an elapsed time in a software system running on a computer; and
  
  a management interface, coupled to said timer, for determining whether said elapsed time matches a threshold, wherein when said elapsed time matches said threshold, said management interface rejuvenates at least a portion of said software system to reduce the likelihood of an outage and without modifying an application running in said software system.

24. A system for software rejuvenation, comprising:
- a determiner for determining whether a fail-to node has adequate resources to accept a failover workload, upon expiration of an inter-rejuvenation interval; and
  
  a rejuvenation agent on a primary node instructing a cluster manager to shut down an open application in a pre-planned manner on the primary node, when said determiner determines that said fail-to node can accept the failover workload, said rejuvenation agent restarting the application on one of the primary node and a secondary node without modifying the application running on said primary node.

25. A system for enhancing software dependability, comprising:
- means for measuring a time elapsed in a software system running on a computer;
  
  means for determining whether said elapsed time matches a threshold; and
  
  means for rejuvenating at least a portion of said software system, when said elapsed time matches said threshold, to reduce a likelihood of an outage and without modifying an application running in said software system.

26. A signal-bearing medium tangibly embodying a program of machine-readable instructions executable by a digital processing apparatus to perform a method for computer-implemented dependability of software, said method comprising:
- measuring an elapsed time in a software system running on a computer;
  
  determining whether said elapsed time matches a threshold; and
  
  when said elapsed time matches said threshold, rejuvenating at least a portion of said software system to reduce the likelihood of an outage and without modifying an application running in said software system.

27. A signal-bearing medium tangibly embodying a program of machine-readable instruction executable by a digital processing apparatus to perform a method for computer-implemented dependability of software, said method comprising:
- waiting for a selected inter-rejuvenation interval to expire in a software system;
  
  determining whether a fail-to node has adequate resources to accept a failover workload;
  
  if said determining determines that the fail-to node cannot accept the failover workload, then sending an alert that adequate resources do not exist to support fault tolerance requirements;
  
  suspending rejuvenation until an operator acknowledges and corrects the deficiency; and
  
  rejuvenating said software without modifying an application running in said software system.

28. A signal-bearing medium tangibly embodying a program of machine-readable instructions executable by a digital processing apparatus to perform a method for computer-implemented dependability of software, said method comprising:
- waiting for a selected inter-rejuvenation interval to expire in a software system;
  
  determining whether a fail-to node has adequate resources to accept a failover workload;
  
  if said determining determines that the fail-to node can accept the failover workload, then a rejuvenation agent on a primary node instructing a cluster manager to shut down an open application in a pre-planned manner on the primary node without modifying the application running on said primary node; and
  
  restarting the application on one of the primary node and a secondary node.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Harper, Richard Edwin, Hunter, Steven Wade
Primary Examiner(s)
Baderman, Scott
Assistant Examiner(s)
LOHN, JOSHUA A

Application Number

US09/442,003
Time in Patent Office

1,336 Days
Field of Search

714/47, 714/38, 714/15, 714/13
US Class Current

714/47.2
CPC Class Codes

G06F 11/008 Reliability or availability...

G06F 11/1438 Restarting or rejuvenating

Method and system for transparent time-based selective software rejuvenation

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

28 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for transparent time-based selective software rejuvenation

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

28 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links