Method and system for detecting a dead server
First Claim
1. A system for removing an inoperative computer from a pool of computers, comprising:
- a first computer associated with a first monitored computer and a second monitored computer, the first computer being operative to monitor the status of the first monitored computer and the second monitored computer;
wherein being operative to monitor the status of the first monitored computer and the second monitored computer comprises being operative to initiate transmission of a first signal to the first monitored computer and to initiate transmission of a second signal to the second monitored computer and to await receipt of a first responsive signal from the first monitored computer and to await receipt of a second responsive signal from the second monitored computer;
a computer list operative to maintain the association between the first computer and the first monitored computer and to maintain the association between the first computer and the second monitored computer;
wherein the first computer is further operative to send a first monitored computer inoperative signal to the computer list, in response to a determination that the first monitored computer is inoperative; and
wherein the computer list is further operative to disassociate the first computer from the first monitored computer and to associate the first computer with a third monitored computer in response to the receipt of the first monitored computer inoperative signal.
1 Assignment
0 Petitions
Accused Products
Abstract
Method and system for detecting a dead server in a multi-server environment. A virtual ring structure is used in which each server in a server pool is only required to monitor the status of two other servers in the server pool. Thus, a server need only transmit ping signals to two other servers (its buddies) in the server pool at any given time. Because each server maintains the status of only two other servers at any given time, the size of the server pool is not limited by the ability of each server to send and process ping signals. The two servers which are monitored by any given server in the server pool are referred to as the “buddy A” server and the “buddy B” server. When the monitoring server determines that one of its buddy servers is down, the monitoring server reports the status of the down server to a SQL server that maintains a server table. The server table maintains a list of each “live” server and the buddy servers assigned to that server. Down servers are removed from the server table. When a server determines that one of its buddies is down, the report to the SQL server results in a buddy reassignment. The buddies of the down server are made buddies of one another and the virtual server ring is once more intact. The SQL server then knows not to route any client to the down server.
18 Citations
19 Claims
-
1. A system for removing an inoperative computer from a pool of computers, comprising:
-
a first computer associated with a first monitored computer and a second monitored computer, the first computer being operative to monitor the status of the first monitored computer and the second monitored computer; wherein being operative to monitor the status of the first monitored computer and the second monitored computer comprises being operative to initiate transmission of a first signal to the first monitored computer and to initiate transmission of a second signal to the second monitored computer and to await receipt of a first responsive signal from the first monitored computer and to await receipt of a second responsive signal from the second monitored computer; a computer list operative to maintain the association between the first computer and the first monitored computer and to maintain the association between the first computer and the second monitored computer; wherein the first computer is further operative to send a first monitored computer inoperative signal to the computer list, in response to a determination that the first monitored computer is inoperative; and wherein the computer list is further operative to disassociate the first computer from the first monitored computer and to associate the first computer with a third monitored computer in response to the receipt of the first monitored computer inoperative signal. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer-implemented method for creating a virtual computer ring, the method comprising:
-
storing an entry in a computer table, wherein the computer table identifies a plurality of computers in a computer pool; wherein each entry comprises a computer identification of a computer in the computer pool, a first monitored computer, and a second monitored computer and; causing at least one of the plurality of computers to monitor its first monitored computer and its second monitored computer to determine whether one of its monitored computers is inoperative; wherein monitoring the status of the first monitored computer and the second monitored computer comprises being operative to initiate transmission of a first signal to the first monitored computer and to initiate transmission of a second signal to the second monitored computer and to await receipt of a first responsive signal from the first monitored computer and to await receipt of a second responsive signal from the second monitored computer; a computer list operative to maintain the association between the one of the plurality of computers and the first monitored computer and to maintain the association between the one of the plurality of computers and the second monitored computer; wherein the one of the plurality of computers is further operative to send a first monitored computer inoperative signal to the computer list, in response to a determination that the first monitored computer is inoperative; and wherein the computer list is further operative to disassociate the one of the plurality of computers from the first monitored computer and to associate the one of the plurality of computers with a third monitored computer in response to the receipt of the first monitored computer inoperative signal. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A computer-implemented method for monitoring the status of a plurality of computers in a pool of computers, the method comprising:
-
assigning each of the plurality of computers a first monitored computer and a second monitored computer within the computer pool; causing each of the plurality of computers to monitor the status of its first monitored computer and its second monitored computer by initiating transmission of a first signal to the first monitored computer and initiating transmission of a second signal to the second monitored computer and waiting to receive a first responsive signal from the first monitored computer and waiting to receive a second responsive signal from the second monitored computer; if one of the plurality of computers determines that one of its monitored computers is inoperative, then causing a monitoring computer to notify a central computer list that stores associations between computers that one of its monitored computers is inoperative; a computer list operative to maintain the association between the one of the plurality of computers and the first monitored computer and to maintain the association between the one of the plurality of computers and the second monitored computer; wherein the one of the plurality of computers is further operative to send a first monitored computer inoperative signal to the computer list, in response to a determination that the first monitored computer is inoperative; and wherein the computer list is further operative to disassociate the one of the plurality of computers from the first monitored computer and to associate the one of the plurality of computers with a third monitored computer in response to the receipt of the first monitored computer inoperative signal. - View Dependent Claims (17, 18)
-
-
19. A computer-readable medium having stored thereon computer-executable instructions which, when executed by a computer, cause the computer to:
-
assign each of the plurality of computers a first monitored computer and a second monitored computer within the computer pool; cause each of the plurality of computers to monitor the status of its first monitored computer and its second monitored computer by initiating transmission of a first signal to the first monitored computer and initiating transmission of a second signal to the second monitored computer and waiting to receive a first responsive signal from the first monitored computer and waiting to receive a second responsive signal from the second monitored computer; if one of the plurality of computers determines that one of its monitored computers is inoperative, then cause a monitoring computer to notify a central computer list that stores associations between computers that one of its monitored computers is inoperative; cause the central computer list to remove the inoperative monitored computer from the central computer list and reassign the inoperative computer'"'"'s other monitored computer to be associated with the monitoring computer when notification is received that the monitored computer is inoperative; cause the central computer list to connect a client connected to the central computer list to one of the plurality of operative computers in the computer pool; if one of the plurality of computers shuts down normally, then cause the computer to report its identity to the central computer table and remove said computer from the computer table and cause the normally shutdown computer'"'"'s first monitored computer and second monitored computer to monitor one another; and
;If a new computer is added to the plurality of computers in the computer pool, cause the central computer list to randomly choose one of the plurality of computers and the randomly chosen computer'"'"'s first monitored computer, reassign the new computer as the randomly chosen computer'"'"'s first monitored computer, and reassign the new computer as either the first monitored computer or second monitored computer of the randomly chosen computer'"'"'s first monitored computer.
-
Specification