Method and system for managing servers in a server cluster

US 7,990,847 B1
Filed: 04/15/2005
Issued: 08/02/2011
Est. Priority Date: 04/15/2005
Status: Active Grant

First Claim

Patent Images

1. A method of passively monitoring servers in a server cluster comprising machine-implemented steps of:

receiving request traffic that is sent from clients to the server cluster;

routing the request traffic to a server in the server cluster;

receiving response traffic from the server in the server cluster;

wherein the response traffic is returned from the server to the clients, the response traffic corresponding to the request traffic;

detecting, within a configured retry time period, whether a number of abnormal end sessions in the response traffic exceeds a first configured failure threshold;

wherein the response traffic includes packets from all of a plurality of connections between the clients and the server;

wherein the number of abnormal end sessions in the response traffic is determined across all of the plurality of connections between the clients and the server;

in response to detecting, within the configured retry time period, that the number of abnormal end sessions in the response traffic exceeds the first configured failure threshold, performing the steps of;

changing a state of the server to a first state that indicates that the server is at least temporarily removed from the server cluster, andstarting a first state time clock;

sending the response traffic to the clients;

when the first state time clock expires, changing the state of the server to a second state that indicates that the server is included in the server cluster;

receiving further response traffic from the server in the server cluster;

wherein the further response traffic corresponds to further request traffic that was sent from the clients to the server cluster;

detecting, within the configured retry time period, whether a number of abnormal end sessions in the further response traffic exceeds a second configured failure threshold;

wherein the further response traffic includes packets from all of the plurality of connections between the clients and the server;

wherein the number of abnormal end sessions in the further response traffic is determined across all of the plurality of connections between the clients and the server;

in response to detecting, within the configured retry time period, that the number of abnormal end sessions in the further response traffic exceeds the second configured failure threshold, changing the state of the server to a third state that indicates that the server is removed from the server cluster;

wherein said second configured failure threshold is less than said first configured failure threshold;

sending the further response traffic to the clients;

wherein the method is performed by one or more network devices.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of managing servers in a server cluster is disclosed. The health of servers is detected through passive return traffic monitoring. Server failure can be detected through TCP information or HTTP return codes. Various settings affecting number of failure thresholds and the time period to detect failures can be configured. Servers can be mapped to URLs such that passive health monitoring can be performed for URLs instead of server clusters.

187 Citations

20 Claims

1. A method of passively monitoring servers in a server cluster comprising machine-implemented steps of:
- receiving request traffic that is sent from clients to the server cluster;
  
  routing the request traffic to a server in the server cluster;
  
  receiving response traffic from the server in the server cluster;
  
  wherein the response traffic is returned from the server to the clients, the response traffic corresponding to the request traffic;
  
  detecting, within a configured retry time period, whether a number of abnormal end sessions in the response traffic exceeds a first configured failure threshold;
  
  wherein the response traffic includes packets from all of a plurality of connections between the clients and the server;
  
  wherein the number of abnormal end sessions in the response traffic is determined across all of the plurality of connections between the clients and the server;
  
  in response to detecting, within the configured retry time period, that the number of abnormal end sessions in the response traffic exceeds the first configured failure threshold, performing the steps of;
  
  changing a state of the server to a first state that indicates that the server is at least temporarily removed from the server cluster, andstarting a first state time clock;
  
  sending the response traffic to the clients;
  
  when the first state time clock expires, changing the state of the server to a second state that indicates that the server is included in the server cluster;
  
  receiving further response traffic from the server in the server cluster;
  
  wherein the further response traffic corresponds to further request traffic that was sent from the clients to the server cluster;
  
  detecting, within the configured retry time period, whether a number of abnormal end sessions in the further response traffic exceeds a second configured failure threshold;
  
  wherein the further response traffic includes packets from all of the plurality of connections between the clients and the server;
  
  wherein the number of abnormal end sessions in the further response traffic is determined across all of the plurality of connections between the clients and the server;
  
  in response to detecting, within the configured retry time period, that the number of abnormal end sessions in the further response traffic exceeds the second configured failure threshold, changing the state of the server to a third state that indicates that the server is removed from the server cluster;
  
  wherein said second configured failure threshold is less than said first configured failure threshold;
  
  sending the further response traffic to the clients;
  
  wherein the method is performed by one or more network devices.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method as recited in claim 1, wherein the first state is the same as the third state.
  - 3. The method as recited in claim 1, wherein the first configured failure threshold is a number of total failures threshold.
  - 4. The method as recited in claim 1, wherein the first configured failure threshold is a number of consecutive failures threshold.
  - 5. The method as recited in claim 1, wherein the response traffic includes TCP header information, and detecting an abnormal end session comprises detecting a TCP RST in the response traffic.
  - 6. The method as recited in claim 1, wherein the response traffic includes TCP header information, and detecting an abnormal end session comprises not receiving a TCP ACK in the response traffic within a timeout period.
  - 7. The method as recited in claim 1, wherein the response traffic includes HTTP return codes, and detecting an abnormal end session comprises detecting a HTTP return code in the response traffic matching a configured return code.

8. An apparatus operable to passively monitor servers in a server cluster, the apparatus comprising:
- one or more processors;
  
  a network interface communicatively coupled to the one or more processors and configured to communicate one or more packet flows among the one or more processors in a network; and
  
  a computer readable medium comprising one or more sequences of instructionswhich, when executed by the one or more processors, cause the one or more processors to perform the steps of;
  
  receiving request traffic that is sent from clients to the server cluster;
  
  routing the request traffic to a server in the server cluster;
  
  receiving response traffic from the server in the server cluster;
  
  wherein the response traffic is returned from the server to the clients, the response traffic corresponding to the request traffic;
  
  detecting, within a configured retry time period, whether a number of abnormal end sessions in the response traffic exceeds a first configured failure threshold;
  
  wherein the response traffic includes packets from all of a plurality of connections between the clients and the server;
  
  wherein the number of abnormal end sessions in the response traffic is determined across all of the plurality of connections between the clients and the server;
  
  in response to detecting, within the configured retry time period, that the number of abnormal end sessions in the response traffic exceeds the first configured failure threshold, performing the steps of;
  
  changing a state of the server to a first state that indicates that the server is at least temporarily removed from the server cluster, andstarting a first state time clock;
  
  sending the response traffic to the clients;
  
  when the first state time clock expires, changing the state of the server to a second state that indicates that the server is included in the server cluster;
  
  receiving further response traffic from the server in the server cluster;
  
  wherein the further response traffic corresponds to further request traffic that was sent from the clients to the server cluster;
  
  detecting, within the configured retry time period, whether a number of abnormal end sessions in the further response traffic exceeds a second configured failure threshold;
  
  wherein the further response traffic includes packets from all of the plurality of connections between the clients and the server;
  
  wherein the number of abnormal end sessions in the further response traffic is determined across all of the plurality of connections between the clients and the server;
  
  in response to detecting, within the configured retry time period, that the number of abnormal end sessions in the further response traffic exceeds the second configured failure threshold, changing the state of the server to a third state that indicates that the server is removed from the server cluster;
  
  wherein said second configured failure threshold is less than said first configured failure threshold;
  
  sending the further response traffic to the clients.
- View Dependent Claims (9, 10, 11, 12, 13)
- - 9. The apparatus of claim 8, wherein:
    - the response traffic includes TCP header information; and
      
      the instructions that cause the one or more processors to perform the step of detecting the number of abnormal end sessions comprise instructions which, when executed by the one or more processors, cause the one or more processors to perform one or more of the steps of;
      
      detecting a TCP RST in the response traffic; and
      
      detecting that a TCP ACK is not received in the response traffic within a timeout period.
  - 10. The apparatus of claim 8, wherein the response traffic includes HTTP return codes, and wherein the instructions that cause the one or more processors to perform the step of detecting the number of abnormal end sessions comprise instructions which, when executed by the one or more processors, cause the one or more processors to perform the step of detecting a HTTP return code in the response traffic matching a configured return code.
  - 11. The apparatus of claim 8, wherein the first state is the same as the third state.
  - 12. The apparatus of claim 8, wherein the first configured failure threshold is a number of total failures threshold.
  - 13. The apparatus of claim 8, wherein the first configured failure threshold is a number of consecutive failures threshold.

14. A non-transitory computer-readable medium storing one or more sequences of instructions for passively monitoring servers in a server cluster, which instructions, when executed by one or more processors, cause the one or more processors to perform the steps of:
- receiving request traffic that is sent from clients to the server cluster;
  
  routing the request traffic to a server in the server cluster;
  
  receiving response traffic from the server in the server cluster;
  
  wherein the response traffic is returned from the server to the clients, the response traffic corresponding to the request traffic;
  
  detecting, within a configured retry time period, whether a number of abnormal end sessions in the response traffic exceeds a first configured failure threshold;
  
  wherein the response traffic includes packets from all of a plurality of connections between the clients and the server;
  
  wherein the number of abnormal end sessions in the response traffic is determined across all of the plurality of connections between the clients and the server;
  
  in response to detecting, within the configured retry time period, that the number of abnormal end sessions in the response traffic exceeds the first configured failure threshold, performing the steps of;
  
  changing a state of the server to a first state that indicates that the server is at least temporarily removed from the server cluster, andstarting a first state time clock;
  
  sending the response traffic to the clients;
  
  when the first state time clock expires, changing the state of the server to a second state that indicates that the server is included in the server cluster;
  
  receiving further response traffic from a server in the server cluster;
  
  wherein the further response traffic corresponds to further request traffic that was sent from the clients to the server cluster;
  
  detecting, within the configured retry time period, whether a number of abnormal end sessions in the further response traffic exceeds a second configured failure threshold;
  
  wherein the further response traffic includes packets from all of the plurality of connections between the clients and the server;
  
  wherein the number of abnormal end sessions in the further response traffic is determined across all of the plurality of connections between the clients and the server;
  
  in response to detecting, within the configured retry time period, that the number of abnormal end sessions in the further response traffic exceeds the second configured failure threshold, changing the state of the server to a third state that indicates that the server is removed from the server cluster;
  
  wherein said second configured failure threshold is less than said first configured failure threshold;
  
  sending the further response traffic to the clients.
- View Dependent Claims (15, 16, 17, 18, 19, 20)
- - 15. The non-transitory computer-readable medium of claim 14, wherein the first state is the same as the third state.
  - 16. The non-transitory computer-readable medium of claim 14, wherein the first configured failure threshold is a number of total failures threshold.
  - 17. The non-transitory computer-readable medium of claim 14, wherein the first configured failure threshold is a number of consecutive failures threshold.
  - 18. The non-transitory computer-readable medium of claim 14, wherein the response traffic includes TCP header information, and the instructions that cause the one or more processors to perform the step of detecting the number of abnormal end sessions comprise instructions which, when executed by the one or more processors, cause the one or more processors to perform the step of detecting a TCP RST in the response traffic.
  - 19. The non-transitory computer-readable medium of claim 14, wherein the response traffic includes TCP header information, and the instructions that cause the one or more processors to perform the step of detecting the number of abnormal end sessions comprise instructions which, when executed by the one or more processors, cause the one or more processors to perform the step of detecting that a TCP ACK is not received in the response traffic within a timeout period.
  - 20. The non-transitory computer-readable medium of claim 14, wherein the response traffic includes HTTP return codes, and wherein the instructions that cause the one or more processors to perform the step of detecting the number of abnormal end sessions comprise instructions which, when executed by the one or more processors, cause the one or more processors to perform the step of detecting a HTTP return code in the response traffic matching a configured return code.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cisco Technology, Inc. (Cisco Systems, Inc.)
Original Assignee
Cisco Technology, Inc. (Cisco Systems, Inc.)
Inventors
Chou, Wesley, Nguyen, Anh Tien, Leroy, David James, Kahol, Anurag
Primary Examiner(s)
Ryman; Daniel J
Assistant Examiner(s)
LEE, JAE YOUNG

Application Number

US11/106,801
Time in Patent Office

2,300 Days
Field of Search

370/218, 370/219, 370/220, 709/227, 714/4, 714/5, 718/105
US Class Current

370/216
CPC Class Codes

H04L 43/0817   by checking functioning

H04L 43/16   Threshold monitoring

H04L 67/1001   for accessing one among a p...

H04L 67/564   Enhancement of application ...

H04L 69/40   for recovering from a failu...

Method and system for managing servers in a server cluster

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

187 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for managing servers in a server cluster

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

187 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links