Method and apparatus for monitoring the health of a computer system

US 7,668,696 B2
Filed: 04/16/2007
Issued: 02/23/2010
Est. Priority Date: 04/16/2007
Status: Active Grant

First Claim

Patent Images

1. A method for monitoring the health of a computer system, comprising:

receiving a variance of a time series for a monitored telemetry variable within the computer system,calculating a residual function of the variance of the time series for the monitored telemetry variable;

calculating a first-difference function of the variance of the time series from the residual function;

determining whether the first-difference function indicates that the computer system is at the onset of degradation; and

if so, performing a remedial action.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system that monitors the health of a computer system is presented. During operation, the system receives a first-difference function for the variance of a time series for a monitored telemetry variable within the computer system. The system then determines whether the first-difference function indicates that the computer system is at the onset of degradation. If so, the system performs a remedial action.

Citations

19 Claims

1. A method for monitoring the health of a computer system, comprising:
- receiving a variance of a time series for a monitored telemetry variable within the computer system,calculating a residual function of the variance of the time series for the monitored telemetry variable;
  
  calculating a first-difference function of the variance of the time series from the residual function;
  
  determining whether the first-difference function indicates that the computer system is at the onset of degradation; and
  
  if so, performing a remedial action.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein prior to receiving the variance for the time series for the monitored telemetry variable, the method further comprises:
    - receiving the time series for the monitored telemetry variable; and
      
      calculating the variance of the time series for the monitored telemetry variable.
  - 3. The method of claim 1, wherein calculating the first-difference function of the time series involves, for each time point within the time series, subtracting a value of the time series at a previous time point from the value of the time series at a present time point.
  - 4. The method of claim 3, further comprises dividing the result of the subtraction by the value of a length of a time interval between the previous time point and the present time point.
  - 5. The method of claim 1, wherein calculating the residual function for a time series involves:
    - for each time interval in the time series,calculating a running average of values for the time series up to and including a present time interval; and
      
      subtracting the running average from a value of the time series at the present time interval.
  - 6. The method of claim 1, wherein determining whether the first-difference function indicates that the computer system is at the onset of degradation involves determining whether the first-difference function exceeds a specified threshold.
  - 7. The method of claim 1, wherein determining whether the first-difference function indicates that the computer system is at the onset of degradation involves:
    - performing a Sequential Probability Ratio Test (SPRT) on the first-difference function; and
      
      determining whether the SPRT generates an alarm.
  - 8. The method of claim 7, wherein the SPRT can include one or more of:
    - a positive variance first-difference test, which generates an alarm if the first-difference function for the variance of the time series for the monitored telemetry variable is increasing; and
      
      a negative variance first-difference test, which generates an alarm if the first-difference function for the variance of the time series for the monitored telemetry variable is decreasing.
  - 9. The method of claim 1, wherein performing the remedial action can involve performing one or more of:
    - recording a time when the onset of degradation occurred;
      
      notifying a system administrator that the computer system is at the onset of degradation;
      
      shutting down the computer system;
      
      backing up data stored on the computer system;
      
      failing-over to a redundant computer system; and
      
      replacing one or more components which are at the onset of degradation.

10. A computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method for monitoring the health of a computer system, wherein the method comprises:
- receiving a variance of a time series for a monitored telemetry variable within the computer system;
  
  calculating a residual function of the variance of the time series for the monitored telemetry variable;
  
  calculating a first-difference function of the variance of the time series from the residual function;
  
  determining whether the first-difference function indicates that the computer system is at the onset of degradation; and
  
  if so, performing a remedial action.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The computer-readable storage medium of claim 10, prior to receiving the variance for the time series for the monitored telemetry variable, the method further comprises:
    - receiving the time series for the monitored telemetry variable; and
      
      calculating the variance of the time series for the monitored telemetry variable.
  - 12. The computer-readable storage medium of claim 10, wherein calculating the first-difference function of the time series involves, for each time point within the time series, subtracting a value of the time series at a previous time point from the value of the time series at a present time point.
  - 13. The computer-readable storage medium of claim 12, wherein the method further comprises dividing the result of the subtraction by the value of a length of a time interval between the previous time point and the present time point.
  - 14. The computer-readable storage medium of claim 10, wherein calculating the residual function for a time series involves:
    - for each time interval in the time series,calculating a running average of values for the time series up to and including a present time interval; and
      
      subtracting the running average from a value of the time series at the present time interval.
  - 15. The computer-readable storage medium of claim 10, wherein determining whether the first-difference function indicates that the computer system is at the onset of degradation involves determining whether the first-difference function exceeds a specified threshold.
  - 16. The computer-readable storage medium of claim 10, wherein determining whether the first-difference function indicates that the computer system is at the onset of degradation involves:
    - performing a Sequential Probability Ratio Test (SPRT) on the first-difference function; and
      
      determining whether the SPRT generates an alarm.
  - 17. The computer-readable storage medium of claim 16, wherein the SPRT can include one or more of:
    - a positive variance first-difference test, which generates an alarm if the first-difference function for the variance of the time series for the monitored telemetry variable is increasing; and
      
      a negative variance first-difference test, which generates an alarm if the first-difference function for the variance of the time series for the monitored telemetry variable is decreasing.
  - 18. The computer-readable storage medium of claim 10, wherein performing the remedial action can involve performing one or more of:
    - recording a time when the onset of degradation occurred;
      
      notifying a system administrator that the computer system is at the onset of degradation;
      
      shutting down the computer system;
      
      backing up data stored on the computer system;
      
      failing-over to a redundant computer system; and
      
      replacing one or more components which are at the onset of degradation.

19. An apparatus that monitors the health of a computer system, comprising:
- a receiving mechanism configured to receive a variance of a time series for a monitored telemetry variable within the computer system,a residual-calculation mechanism configured to calculate a residual function of the variance of the time series for the monitored telemetry variable;
  
  a difference-calculation mechanism configured to calculate a first-difference function of the variance of the time series from the residual function;
  
  a degradation-detection mechanism configured to determine whether the first-difference function indicates that the computer system is at the onset of degradation; and
  
  a remedial-action mechanism, wherein if the degradation-detection mechanism determines that the computer system is at the onset of degradation, the remedial-action mechanism is configured to perform a remedial action.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Oracle America, Inc. (Oracle Corporation)
Original Assignee
Sun Microsystems Incorporated (Oracle Corporation)
Inventors
McElfresh, David, Gross, Kenny C., Vacar, Dan
Primary Examiner(s)
Raymond; Edward
Assistant Examiner(s)
Desta; Elias

Application Number

US11/787,719
Publication Number

US 20080255807A1
Time in Patent Office

1,044 Days
Field of Search

702/185, 702/186, 714 37- 39, 714/47, 714/48
US Class Current

702/186
CPC Class Codes

G06F 11/30 Monitoring

Method and apparatus for monitoring the health of a computer system

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for monitoring the health of a computer system

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links