Screening methodology for operating system error reporting

US 6,098,181 A
Filed: 04/10/1997
Issued: 08/01/2000
Est. Priority Date: 04/10/1997
Status: Expired due to Fees

First Claim

Patent Images

1. A method for monitoring selected computer resources in a computer system, said method comprising:

scanning resource measurements taken from selected ones of said computer resources;

detecting a number of continuing failures of one of said computer resources;

determining that said number of continuing failures meets or exceeds a predetermined failure repetition constant; and

notifying an operating system of said detected continuing failures only after said number of failures meets or exceeds said predetermined repetition constant, whereby said operating system is not interrupted by mere spurious failure detections of selectively monitored computer resources.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method 201 and implementing system 101 are provided in which information processing system resources are monitored for variant conditions which may exceed acceptable tolerances. A continued count of failed readings is maintained 207 and only when the number of failed condition readings exceed a predetermined failure repetition constant 209, is the operating system notified to take appropriate action. When a resource previously identified as a failing resource, subsequently yields "good" readings 303, a resource identifier is moved from a "failing" status to an "intermittent failure" status 309. Thereafter, when successive "good" readings exceed a predetermined success repetition constant 319, the resource is removed from the "intermittent" status 321 and the monitoring process 201 is continued.

Citations

19 Claims

1. A method for monitoring selected computer resources in a computer system, said method comprising:
- scanning resource measurements taken from selected ones of said computer resources;
  
  detecting a number of continuing failures of one of said computer resources;
  
  determining that said number of continuing failures meets or exceeds a predetermined failure repetition constant; and
  
  notifying an operating system of said detected continuing failures only after said number of failures meets or exceeds said predetermined repetition constant, whereby said operating system is not interrupted by mere spurious failure detections of selectively monitored computer resources.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method as set forth in claim 1 wherein said notifying occurs only after a number of successive readings indicating said out-of-tolerance condition exceeds a first predetermined number of consecutive readings associated with one of said computer resources.
  - 3. The method as set forth in claim 1 wherein said notifying is provided to an operating system within an information processing system.
  - 4. The method as set forth in claim 3 wherein said computer system comprises a plurality of processing units coupled together in a network configuration.
  - 5. The method as set forth in claim 4 wherein said network includes a host unit, said notifying being provided to an operating system operating from said host unit.
  - 6. The method as set forth in claim 3 wherein said information processing system includes a plurality of computer resources, said method further including:
    - taking readings from selected ones of said computer resources in a cyclic manner so as to maintain a continuing monitoring function for said selected computer resources.
  - 7. The method as set forth in claim 6 and further including:
    - making a failing resource table for storing an identification of said computer resources which have been detected to have continuing failures.
  - 8. The method as set forth in claim 7 and further including:
    - identifying one of said computer resources as a good resource if a first reading indicates a second predetermined condition of said one computer resource; and
      
      moving said one computer resource from said failing resource table to an intermittent resource table if said one computer resource is listed in said failing resource table.
  - 9. The method as set forth in claim 8 and further including:
    - removing said one computer resource from said intermittent resource table if a number of successive readings indicating said second predetermined condition exceeds a second predetermined constant associated with said one computer resource.

10. A method for monitoring resource conditions of a resource in an information processing system, said information processing system being arranged whereby said information processing system is not interrupted by spurious failure detections from said resource, said system including a failing resource table identifying at least one resource from which consecutive readings have been taken indicating a failing condition of said resource, said method comprising:
- taking a reading from said one resource;
  
  identifying said one resource as a good resource if saidreading indicates a predetermined favorable condition of said one resource;
  
  determining that said one resource is listed in said failing resource table; and
  
  moving said one resource from said failing resource table after said reading indicative of said predetermined favorable condition, said predetermined favorable condition being representative of an operable state of said one resource.
- View Dependent Claims (11, 12)
- - 11. The method as set forth in claim 10 and further including:
    - creating an intermittent resource table, said one resource being moved to said intermittent resource table from said failing resource table after said reading indicative of said predetermined favorable condition.
  - 12. The method as set forth in claim 11 and further including:
    - removing said one of said processing resources from said intermittent resource table only after a number of successive readings indicate a continuance of said favorable condition for said one resource.

13. A storage medium including machine readable indicia, said storage medium being selectively coupled to a reading device, said reading device being selectively coupled to processing circuitry within a processing system, said reading device being selectively operable to read said machine readable indicia and provide program signals representative thereof, said program signals being effective to cause said processing circuitry to monitor resource conditions of system resources coupled to said processing system and provide output signals representative of said resource conditions by performing the steps of:
- scanning resource measurements taken from selected ones of said system resources;
  
  detecting a number of continuing failures of one of said system resources;
  
  determining that said number of continuing failures meets or exceeds a predetermined failure repetition constant; and
  
  notifying an operating system of said detected continuing failures only after said number of failures meets or exceeds said predetermined repetition constant, whereby said operating system is not interrupted by mere spurious failure detections of selectively monitored system resources.

14. A storage medium including machine readable indicia, said storage medium being selectively coupled to a reading device, said reading device being selectively coupled to processing circuitry within a processing system, said reading device being selectively operable to read said machine readable indicia and provide program signals representative thereof, said program signals being effective to cause said processing circuitry to monitor resource conditions of said processing system resources and provide output signals representative of said resource conditions, said processing system being arranged whereby said processing system is not interrupted by spurious failure detections from said processing resource, said processing system including a failing resource table identifying processing resources from which consecutive readings have been taken indicating a failing condition, said processing system being responsive to said programming signals to accomplish the steps of:
- taking a reading from a first of said processing resources;
  
  identifying said first processing resource as a good resource if said reading indicates a predetermined favorable condition of said first processing resource;
  
  determining that said first processing resource is listed in said failing resource table; and
  
  moving said first processing resource from said failing resource table after said reading indicative of said predetermined favorable condition, said predetermined favorable condition being representative of an operable state of said first processing resource.
- View Dependent Claims (15, 16)
- - 15. The medium as set forth in claim 14 wherein said processing system is further responsive to said program signals for:
    - creating an intermittent resource table, said first resource being moved to said intermittent resource table from said failing resource table after said reading indicative of said predetermined favorable condition.
  - 16. The medium as set forth in claim 15 wherein said processing system is further responsive to said program signals for:
    - removing said first processing resource from said intermittent resource table only after a number of successive readings indicate a continuance of said favorable condition for said first processing resource.

17. An information processing system comprising:
- a processing device;
  
  a memory unit;
  
  a bus connecting said processing device and said memory unit; and
  
  a system resource monitoring interface connected to said bus, said interface being selectively operable for providing readings taken from system resources, said readings being indicative of an operating condition of said system resources, said processing device being selectively operable for executing a program from said memory for;
  
  scanning resource measurements taken from selected ones of said system resources;
  
  detecting a number of continuing failures of one of said system resources;
  
  determining that said number of continuing failures meets or exceeds a predetermined failure repetition constant; and
  
  notifying an operating system of said detected continuing failures only after said number of failures meets or exceeds said predetermined repetition constant, whereby said operating system is not interrupted by mere spurious failure detections of selectively monitored system resources.

18. An information processing system comprising:
- a processing device;
  
  a memory unit;
  
  a bus connecting said processing device and said memory unit; and
  
  a system resource monitoring interface connected to said bus, said interface being selectively operable for providing readings taken from system resources, said readings being indicative of an operating condition of said system resources, said information processing system being arranged whereby said information processing system is not interrupted by spurious failure detections from said system resources, said information processing system including a failing resource table identifying system resources from which consecutive readings have been taken indicating a failing condition, said processing device being selectively operable for executing a program from said memory for;
  
  taking a reading from one of said system resources;
  
  identifying said one system resource as a good resource if said reading indicates a predetermined favorable condition of said one system resource;
  
  determining that said one system resource is listed in said failing resource table; and
  
  moving said one system resource from said failing resource table after said reading indicative of said predetermined favorable condition, said predetermined favorable condition being representative of an operable state of said one resource.

19. A method for monitoring operating status of at least one resource within an information processing system, said method comprising:
- receiving readings from said one resource, said readings being representative of detected operational states of said one resource;
  
  providing indicia representative of detected in-tolerance and out-of-tolerance operational states of said one resource;
  
  maintaining a memory of said indicia;
  
  changing said memory automatically between said first and second states in accordance with said operational states; and
  
  providing notice of an out-of-tolerance condition of said one resource only after a succession of at least two detections of said out-of-tolerance condition, whereby said information processing system is not interrupted by mere spurious failure detections of selectively monitored system resources.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Hamilton, Rick Allen II, Upton, John Daniel, Mehta, Chet
Primary Examiner(s)
Beausoliel, Jr., Robert W.
Assistant Examiner(s)
Elisca, Pierre E.

Application Number

US08/827,746
Time in Patent Office

1,209 Days
Field of Search

395/183.01, 395/183.02, 395/183.06, 395/187.07, 395/183.15, 395/185.01, 395/185.05, 371/5.1, 371/37.7, 371/51.1, 714/718, 714/25, 714/47, 714/26, 702/182
US Class Current

714/25
CPC Class Codes

G06F 11/0706   the processing taking place...

G06F 11/0751   Error or fault detection no...

G06F 11/076   by exceeding a count or rat...

G06F 11/3055   Monitoring arrangements for...

G06F 11/3079   the data filtering being ac...

G06F 2201/81   Threshold

G06F 2201/88   Monitoring involving counting

Screening methodology for operating system error reporting

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Screening methodology for operating system error reporting

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links