METHOD FOR PERFORMING A CORRECTIVE ACTION UPON A SUB-SYSTEM

US 20080215924A1
Filed: 04/10/2008
Published: 09/04/2008
Est. Priority Date: 02/22/2002
Status: Active Grant

First Claim

Patent Images

1. A method for monitoring the health of a server comprising:

maintaining a server having a sub-system and a server self health monitor;

detecting the occurrence of a health check event by the server self health monitor;

transmitting a request by the server self health monitor to the sub-system for sub-system health information;

determining the health of the server by the server self health monitor using the sub-system health information; and

performing a corrective action upon the sub-system, by the server health monitor or the sub-system, wherein the corrective action is based on the health of the sub-system; and

wherein a first parameter specifies the maximum number of times a server can be restarted within a period of time specified by a second parameter.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A server self health monitor (SHM) system monitors the health of the server it resides on. The health of a server is determined by the health of all of a server'"'"'s sub-systems and deployed applications. The SHM may make health check inquiries to server sub-systems periodically or based on external trigger events. The sub-systems perform self health checks on themselves and provide sub-system health information to requesting entities such as the SHM. Sub-systems self health updates may be based on internal events such as counters or changes in status or based on external entity requests. Corrective action may be performed upon sub-systems by the SHM depending on their health status or the health status of the server. Corrective action may also be performed by a sub-system upon itself.

Citations

20 Claims

1. A method for monitoring the health of a server comprising:
- maintaining a server having a sub-system and a server self health monitor;
  
  detecting the occurrence of a health check event by the server self health monitor;
  
  transmitting a request by the server self health monitor to the sub-system for sub-system health information;
  
  determining the health of the server by the server self health monitor using the sub-system health information; and
  
  performing a corrective action upon the sub-system, by the server health monitor or the sub-system, wherein the corrective action is based on the health of the sub-system; and
  
  wherein a first parameter specifies the maximum number of times a server can be restarted within a period of time specified by a second parameter.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 20)
- - 2. The method as claimed in claim 1 wherein the health check event is expiration of a period of time.
  - 3. The method as claimed in claim 1 wherein the health check event is a request from an administration server.
  - 4. The method as claimed in claim 1 wherein the health check event is an event occurring external to the server, and wherein occurrence of the event is communicated to the server self health monitor.
  - 5. The method as claimed in claim 1 wherein said transmitting a request includes transmitting a request from the server self health monitor to all sub-systems in a server, to request each sub-system'"'"'s health information.
  - 6. The method as claimed in claim 1 wherein the server provides its health information to requesting entities.
  - 7. The method as claimed in claim 1 wherein said determining sub-system health information includes the server self health monitor failing to receive sub-system health information from a sub-system, and the server self health monitor determining the sub-system has failed as a result of the failure to receive sub-system health information from the sub-system.
  - 8. The method as claimed in claim 1 wherein said determining the health of the server includes determining whether all the sub-systems are in a failed state.
  - 9. The method as claimed in claim 1 wherein said determining the health of the server includes determining whether a critical sub-system is in a failed state.
  - 10. The method as claimed in claim 1 further comprising:
    - processing the server health information.
  - 11. The method as claimed in claim 10 wherein said processing includes restarting a failed sub-system.
  - 12. The method as claimed in claim 10 wherein said processing includes restarting all sub-systems if a critical sub-system is failed.
  - 13. The method as claimed in claim 10 wherein said processing includes storing the sub-system health information.
  - 14. The method as claimed in claim 10 wherein said processing includes determining if a condition is met.
  - 15. The method as claimed in claim 14 wherein said processing includes determining if the sub-system'"'"'s health status has changed.
  - 16. The method of claim 1, wherein a health update indicates that the sub-system is at one of multiple pre-defined health levels.
  - 17. The method of claim 16, wherein health levels correspond to conditions, the conditions including good, failed, and between good and failed.
  - 19. The method of claim 1, wherein the sub-system performs a health check upon itself and provides sub-system health information to requesting entities.
  - 20. The method of claim 1, wherein sub-system health updates are triggered by external entity requests, internal events such as counters, or changes in status.

18. The method of claim 18, wherein the sub-system is set to a critical level if a minimum number of transactions have timed out.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Oracle International Corporation (Oracle Corporation)
Original Assignee
BEA Systems Incorporated (Oracle Corporation)
Inventors
Srivastava, Rahul, Halpern, Eric M.

Granted Patent

US 7,849,367 B2
Time in Patent Office

Days
Field of Search
US Class Current

714/39
CPC Class Codes

G06F 11/0709   in a distributed system con...

G06F 11/0757   by exceeding a time limit, ...

G06F 11/0793   Remedial or corrective acti...

G06F 11/3006   where the computing system ...

G06F 11/3055   Monitoring arrangements for...

G06F 11/3082   the data filtering being ac...

METHOD FOR PERFORMING A CORRECTIVE ACTION UPON A SUB-SYSTEM

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

METHOD FOR PERFORMING A CORRECTIVE ACTION UPON A SUB-SYSTEM

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links