Method and apparatus for monitoring the health of a computer system

US 20080255807A1
Filed: 04/16/2007
Published: 10/16/2008
Est. Priority Date: 04/16/2007
Status: Active Grant

First Claim

Patent Images

1. A method for monitoring the health of a computer system, comprising:

receiving a first-difference function for the variance of a time series for a monitored telemetry variable within the computer system;

determining whether the first-difference function indicates that the computer system is at the onset of degradation; and

if so, performing a remedial action.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system that monitors the health of a computer system is presented. During operation, the system receives a first-difference function for the variance of a time series for a monitored telemetry variable within the computer system. The system then determines whether the first-difference function indicates that the computer system is at the onset of degradation. If so, the system performs a remedial action.

Citations

21 Claims

1. A method for monitoring the health of a computer system, comprising:
- receiving a first-difference function for the variance of a time series for a monitored telemetry variable within the computer system;
  
  determining whether the first-difference function indicates that the computer system is at the onset of degradation; and
  
  if so, performing a remedial action.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein prior to receiving the first-difference function, the method further comprises:
    - receiving the variance for the time series for the monitored telemetry variable;
      
      calculating a residual function of the variance of the time series for the monitored telemetry variable; and
      
      calculating the first-difference function from the residual function of the variance.
  - 3. The method of claim 2, wherein prior to receiving the variance for the time series for the monitored telemetry variable, the method further comprises:
    - receiving the time series for the monitored telemetry variable; and
      
      calculating the variance of the time series for the monitored telemetry variable.
  - 4. The method of claim 2, wherein calculating the first-difference function of the time series involves, for each time point within the time series, subtracting a value of the time series at a previous time point from the value of the time series at a present time point.
  - 5. The method of claim 4, further comprises dividing the result of the subtraction by the value of a length of a time interval between the previous time point and the present time point.
  - 6. The method of claim 2, wherein calculating the residual function for a time series involves:
    - for each time interval in the time series,calculating a running average of values for the time series up to and including a present time interval; and
      
      subtracting the running average from a value of the time series at the present time interval.
  - 7. The method of claim 1, wherein determining whether the first-difference function indicates that the computer system is at the onset of degradation involves determining whether the first-difference function exceeds a specified threshold.
  - 8. The method of claim 1, wherein determining whether the first-difference function indicates that the computer system is at the onset of degradation involves:
    - performing a Sequential Probability Ratio Test (SPRT) on the first-difference function; and
      
      determining whether the SPRT generates an alarm.
  - 9. The method of claim 8, wherein the SPRT can include one or more of:
    - a positive variance first-difference test, which generates an alarm if the first-difference function for the variance of the time series for the monitored telemetry variable is increasing; and
      
      a negative variance first-difference test, which generates an alarm if the first-difference function for the variance of the time series for the monitored telemetry variable is decreasing.
  - 10. The method of claim 1, wherein performing the remedial action can involve performing one or more of:
    - recording a time when the onset of degradation occurred;
      
      notifying a system administrator that the computer system is at the onset of degradation;
      
      shutting down the computer system;
      
      backing up data stored on the computer system;
      
      failing-over to a redundant computer system;
      
      replacing one or more components which are at the onset of degradation; and
      
      performing other remedial actions.

11. A computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method for monitoring the health of a computer system, wherein the method comprises:
- receiving a first-difference function for the variance of a time series for a monitored telemetry variable within the computer system;
  
  determining whether the first-difference function indicates that the computer system is at the onset of degradation; and
  
  if so, performing a remedial action.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The computer-readable storage medium of claim 11, wherein prior to receiving the first-difference function, the method further comprises:
    - receiving the variance for the time series for the monitored telemetry variable;
      
      calculating a residual function of the variance of the time series for the monitored telemetry variable; and
      
      calculating the first-difference function from the residual function of the variance.
  - 13. The computer-readable storage medium of claim 12, prior to receiving the variance for the time series for the monitored telemetry variable, the method further comprises:
    - receiving the time series for the monitored telemetry variable; and
      
      calculating the variance of the time series for the monitored telemetry variable.
  - 14. The computer-readable storage medium of claim 12, wherein calculating the first-difference function of the time series involves, for each time point within the time series, subtracting a value of the time series at a previous time point from the value of the time series at a present time point.
  - 15. The computer-readable storage medium of claim 14, wherein the method further comprises dividing the result of the subtraction by the value of a length of a time interval between the previous time point and the present time point.
  - 16. The computer-readable storage medium of claim 12, wherein calculating the residual function for a time series involves:
    - for each time interval in the time series,calculating a running average of values for the time series up to and including a present time interval; and
      
      subtracting the running average from a value of the time series at the present time interval.
  - 17. The computer-readable storage medium of claim 11, wherein determining whether the first-difference function indicates that the computer system is at the onset of degradation involves determining whether the first-difference function exceeds a specified threshold.
  - 18. The computer-readable storage medium of claim 11, wherein determining whether the first-difference function indicates that the computer system is at the onset of degradation involves:
    - performing a Sequential Probability Ratio Test (SPRT) on the first-difference function; and
      
      determining whether the SPRT generates an alarm.
  - 19. The computer-readable storage medium of claim 18, wherein the SPRT can include one or more of:
    - a positive variance first-difference test, which generates an alarm if the first-difference function for the variance of the time series for the monitored telemetry variable is increasing; and
      
      a negative variance first-difference test, which generates an alarm if the first-difference function for the variance of the time series for the monitored telemetry variable is decreasing.
  - 20. The computer-readable storage medium of claim 11, wherein performing the remedial action can involve performing one or more of:
    - recording a time when the onset of degradation occurred;
      
      notifying a system administrator that the computer system is at the onset of degradation;
      
      shutting down the computer system;
      
      backing up data stored on the computer system;
      
      failing-over to a redundant computer system;
      
      replacing one or more components which are at the onset of degradation; and
      
      performing other remedial actions.

21. An apparatus that monitors the health of a computer system, comprising:
- a receiving mechanism configured to receive a first-difference function for the variance of a time series for a monitored telemetry variable within the computer system;
  
  a degradation-detection mechanism configured to determine whether the first-difference function indicates that the computer system is at the onset of degradation; and
  
  a remedial-action mechanism, wherein if the degradation-detection mechanism determines that the computer system is at the onset of degradation, the remedial-action mechanism is configured to perform a remedial action.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Oracle America, Inc. (Oracle Corporation)
Original Assignee
Oracle America, Inc. (Oracle Corporation)
Inventors
Gross, Kenny C., Vacar, Dan, McElfresh, David

Granted Patent

US 7,668,696 B2
Time in Patent Office

Days
Field of Search
US Class Current

702/186
CPC Class Codes

G06F 11/30 Monitoring

Method and apparatus for monitoring the health of a computer system

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for monitoring the health of a computer system

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links