Techniques for maintaining fault tolerance for software programs in a clustered computer system
First Claim
1. A method for maintaining a predefined acceptable fault tolerance level for a plurality of software modules implementing a software program running on a first plurality of computers coupled together in a cluster configuration in a first cluster in a clustered computer system, said first plurality of computers being coupled to a first intelligent director agent, said method comprising:
- tracking, using said first intelligent director agent, status of said software modules running on said first plurality of computers;
ascertaining a fault tolerance level associated with said software program, said ascertaining being ascertained by examining said status of said software modules running on said first plurality of computers;
if said fault tolerance level is below said predefined acceptable fault tolerance level, searching for a first suitable computer among said first plurality of computers to load another module of said software program thereon, said first suitable computer representing a computer of said first plurality of computers that does not have a module of said software program running thereon, said first suitable computer being compatible to execute said another module of said software program; and
if said first suitable computer is available, loading said another module of said software program on said first suitable computer, registering said first suitable computer as a computer capable of servicing transaction requests pertaining to said software program after said another module of said software program is loaded onto said first suitable computer, and routing said transaction requests pertaining to said software program to said first suitable computer after said registering.
4 Assignments
0 Petitions
Accused Products
Abstract
A method for maintaining a predefined acceptable fault tolerance level for a plurality of software modules implementing a software program running on a first plurality of computers coupled together in a cluster configuration in a first cluster in a clustered computer system. The first plurality of computers being coupled to a first intelligent director agent. The method includes tracking, using the first intelligent director agent, status of the software modules running on the first plurality of computers. The method also includes ascertaining a fault tolerance level associated with the software program, with the ascertaining being ascertained by examining the status of the software modules running on the first plurality of computers. If the fault tolerance level is below the predefined acceptable fault tolerance level, the method also includes searching for a first suitable computer among the first plurality of computers to load another module of the software program thereon. The first suitable computer represents a computer of the first plurality of computers that does not have a module of the software program running thereon. The first suitable computer is compatible to execute the another copy of the computer program. If the first suitable computer is available, the method further includes loading the another module of the software program on the first suitable computer, registering the first suitable computer as a computer capable of servicing transaction requests pertaining to the software program after the another module of the software program is loaded onto the first suitable computer, and routing the transaction requests pertaining to the software program to the first suitable computer after the registering.
125 Citations
12 Claims
-
1. A method for maintaining a predefined acceptable fault tolerance level for a plurality of software modules implementing a software program running on a first plurality of computers coupled together in a cluster configuration in a first cluster in a clustered computer system, said first plurality of computers being coupled to a first intelligent director agent, said method comprising:
-
tracking, using said first intelligent director agent, status of said software modules running on said first plurality of computers;
ascertaining a fault tolerance level associated with said software program, said ascertaining being ascertained by examining said status of said software modules running on said first plurality of computers;
if said fault tolerance level is below said predefined acceptable fault tolerance level, searching for a first suitable computer among said first plurality of computers to load another module of said software program thereon, said first suitable computer representing a computer of said first plurality of computers that does not have a module of said software program running thereon, said first suitable computer being compatible to execute said another module of said software program; and
if said first suitable computer is available, loading said another module of said software program on said first suitable computer, registering said first suitable computer as a computer capable of servicing transaction requests pertaining to said software program after said another module of said software program is loaded onto said first suitable computer, and routing said transaction requests pertaining to said software program to said first suitable computer after said registering. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
searching, if said fault tolerance level is below said predefined acceptable fault tolerance level and said first suitable computer is not available, for a second suitable computer among said second plurality of computers to load another module of said software program thereon, said second plurality of computers being coupled together in a second cluster configuration at a geographic site remote from said first plurality of computers, said second suitable computer representing a computer of said second plurality of computers that does not have a module of said software program running thereon and being compatible to execute said another copy of said computer program;
if said second suitable computer is available, loading said another module of said software program on said second suitable computer, registering said second suitable computer as a computer capable of servicing transaction requests pertaining to said software program after said another module of said software program is loaded onto said second suitable computer, and routing said transaction requests pertaining to said software program to said second suitable computer after said registering.
-
-
7. The method of claim 6 wherein said loading another module of said software program on said second suitable computer is performed responsive to instructions from said first intelligent director agent.
-
8. The method of claim 1 wherein said software program represents a software program that implement business logic in said clustered computer system.
-
9. The method of claim 1 further comprising issuing a warning to an operator of said clustered computer system if said fault tolerance level associated with said software program is ascertained to be below said predefined acceptable fault tolerance level.
-
10. The method of claim 1 further comprising removing a first software module from said first suitable computer to allow said first suitable computer to have sufficient processing capability to be loaded with said another module of said software program.
-
11. A method for maintaining a predefined acceptable fault tolerance level for a plurality of software modules implementing a software program running on a first plurality of computers coupled together in a cluster configuration in a first cluster in a clustered computer system, said first plurality of computers being coupled to a first intelligent director agent, wherein said clustered computer system includes a second plurality of computers coupled together in a cluster configuration in a second cluster, said second cluster being located in a geographic site that is remote from a geographic site implementing said first cluster, said method comprising:
-
tracking, using said first intelligent director agent, status of said software modules running on said first plurality of computers;
ascertaining a fault tolerance level associated with said software program, said ascertaining being ascertained by examining said status of said software modules running on said first plurality of computers;
if said fault tolerance level is below said predefined acceptable fault tolerance level, searching for a first suitable computer among said first plurality of computers to load another module of said software program thereon, said first suitable computer representing a computer of said first plurality of computers that does not have a module of said software program running thereon, said first suitable computer being compatible to execute said another module of said software program;
if said first suitable computer is available, loading said another module of said software program on said first suitable computer, registering said first suitable computer as a computer capable of servicing transaction requests pertaining to said software program after said another module of said software program is loaded onto said first suitable computer, and routing said transaction requests pertaining to said software program to said first suitable computer after said registering;
searching, if said fault tolerance level is below said predefined acceptable fault tolerance level and said first suitable computer is not available, for a second suitable computer among said second plurality of computers to load another module of said software program thereon, said second plurality of computers being coupled together in a second cluster configuration at a geographic site remote from said first plurality of computers, said second suitable computer representing a computer of said second plurality of computers that does not have a module of said software program running thereon and being compatible to execute said another module of said software program, wherein said searching for said second suitable computer employs software module-specific information stored at a second intelligent director agent associated with said second plurality of computers at said second site; and
if said second suitable computer is available, loading said another module of said software program on said second suitable computer, registering said second suitable computer as a computer capable of servicing transaction requests pertaining to said software program after said another module of said software program is loaded onto said second suitable computer, and routing said transaction requests pertaining to said software program to said second suitable computer after said registering.
-
-
12. A method for maintaining a predefined acceptable fault tolerance level for a plurality of software modules implementing a software program running on a first plurality of computers coupled together in a cluster configuration in a first cluster in a clustered computer system, said first plurality of computers being coupled to a first intelligent director agent, wherein said clustered computer system includes a second plurality of computers coupled together in a cluster configuration in a second cluster, said second cluster being located in a geographic site that is remote from a geographic site implementing said first cluster, said method comprising:
-
tracking, using said first intelligent director agent, status of said software modules running on said first plurality of computers;
ascertaining a fault tolerance level associated with said software program, said ascertaining being ascertained by examining said status of said software modules running on said first plurality of computers;
if said fault tolerance level is below said predefined acceptable fault tolerance level, searching for a first suitable computer among said first plurality of computers to load another module of said software program thereon, said first suitable computer representing a computer of said first plurality of computers that does not have a module of said software program running thereon, said first suitable computer being compatible to execute said another module of said software program;
if said first suitable computer is available, loading said another module of said software program on said first suitable computer, registering said first suitable computer as a computer capable of servicing transaction requests pertaining to said software program after said another module of said software program is loaded onto said first suitable computer, and routing said transaction requests pertaining to said software program to said first suitable computer after said registering;
searching, if said fault tolerance level is below said predefined acceptable fault tolerance level and said first suitable computer is not available, for a second suitable computer among said second plurality of computers to load another module of said software program thereon, said second plurality of computers being coupled together in a second cluster configuration at a geographic site remote from said first plurality of computers, said second suitable computer representing a computer of said second plurality of computers that does not have a module of said software program running thereon and being compatible to execute said another module of said software program; and
if said second suitable computer is available, loading said another module of said software program on said second suitable computer, registering said second suitable computer as a computer capable of servicing transaction requests pertaining to said software program after said another module of said software program is loaded onto said second suitable computer, and routing said transaction requests pertaining to said software program to said second suitable computer after said registering, wherein said loading another module of said software program on said second suitable computer is performed responsive to instructions from said second intelligent director agent.
-
Specification