System abstraction layer, processor abstraction layer, and operating system error handling
First Claim
1. A method executing in a computer system comprising:
- detecting an error by a detecting processor;
executing error handling code of a first layer of software, by the detecting processor, to perform the following;
saving state information;
attempting to correct the error;
after failure to correct the error, executing error handling code of a second layer of software by the detecting processor to perform the following;
determining severity of error by analyzing state information and the error received from the first layer;
saving additional state information; and
halting the computer system if necessary; and
after failure to correct the error by the second layer of software, executing error handling code of an operating system by the detecting processor to perform the following;
checking state information and the error to determine whether processing can continue;
halting the computer system if processing unless processing can continue; and
attempting to correct the error.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods for error handling are disclosed. The systems and methods may be utilized for single or multiple processor computer systems to handle errors in a coordinated manner between hardware and any firmware or software layers. A computer system includes a non volatile memory and at least one processor. A firmware error handling routine is stored on the non volatile memory. The firmware error handling routine is for handling errors. Each of the at least one processors detects errors. Each processor executes the firmware error handling routine on detecting an error. The executed firmware error handling routine handles the error. The executed firmware error handling routine also logs error information to a log.
The systems and methods provide for coordinated error handling that enhance error recovery, provide error containment and maintain system availability.
121 Citations
52 Claims
-
1. A method executing in a computer system comprising:
-
detecting an error by a detecting processor;
executing error handling code of a first layer of software, by the detecting processor, to perform the following;
saving state information;
attempting to correct the error;
after failure to correct the error, executing error handling code of a second layer of software by the detecting processor to perform the following;
determining severity of error by analyzing state information and the error received from the first layer;
saving additional state information; and
halting the computer system if necessary; and
after failure to correct the error by the second layer of software, executing error handling code of an operating system by the detecting processor to perform the following;
checking state information and the error to determine whether processing can continue;
halting the computer system if processing unless processing can continue; and
attempting to correct the error.
-
-
2. A method comprising:
-
detecting an error;
sending a signal to a processor abstraction layer, a system abstraction layer, and an operating system of the error;
interrupting processing if necessary;
attempting to correct the error by the processor abstraction layer and informing the system abstraction layer of success or failure in correcting the error;
after success by the processor abstraction layer, informing the operating system of the correction;
after failure by the processor abstraction layer, attempting to correct the error by the system abstraction layer and informing the operating system of success or failure;
after failure by the system abstraction layer, attempting to correct the error by the operating system; and
after failure by the operating system, initiating a system reboot. - View Dependent Claims (3)
-
-
4. A method, where:
-
a processor detects an error;
a processor abstraction layer (PAL) error handler creates an entry in an error log, saves state information, and attempts to correct the error within the processor hardware;
if the PAL error handler fails to correct the error, a system abstraction layer (SAL) error handler accesses the error log, determines a severity of the error, and attempts to correct the error within the system hardware;
if the SAL error handler fails to correct the error, an operating system (OS) error handler accesses the error log, and in response to the severity either attempts to correct the error within the system software or terminates a software process. - View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A computer-readable storage medium containing instructions to execute on a computer the method where:
-
a processor detects an error;
a processor abstraction layer (PAL) error handler creates an entry in an error log, saves state information, and attempts to correct the error within the processor hardware;
if the PAL error handler fails to correct the error, a system abstraction layer (SAL) error handler accesses the error log, determines a severity of the error, and attempts to correct the error within the system hardware;
if the SAL error handler fails to correct the error, an operating-system (OS) error handler accesses the error log, and in response to the severity either attempts to correct the error within the system software or terminates a software process.
-
-
19. A data processing system, comprising:
-
a processor to execute software processes and to detect an error in the system, and including a processor abstraction layer (PAL) to present a consistent interface from any of a number of different processor models;
an error log having at least one entry to record the error;
a PAL error handler to save state information, to create the error-log entry, and to diagnose and/or correct errors within the PAL, the PAL error handler including a set of PAL error-handling routines responsive to the state information and to the error-log entry;
system hardware including a system abstraction layer (SAL) to present a consistent interface from any of a number of different system hardware models;
a SAL error handler, responsive to failure of the PAL error handler, to the saved state information, and to the error-log entry to diagnose and/or correct errors within the SAL and to produce a severity indication;
an operating system (OS) error handler, responsive to failure of the SAL error handler, to diagnose and/or correct errors within one of the software processes executing on the system, and responsive to the severity indication to terminate a software process;
an error log having at least one entry in a standard format accessed by the PAL, SAL, and OS error handlers to employ in diagnosing and/or correcting the error. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38)
-
-
39. A method, where:
-
any one of multiple processors detects an error;
a PAL error handler executes within the one processor that detected the error to create an entry in an error log, save state information, attempt to diagnose and/or correct the error;
if the one PAL error handler fails to correct the error, a system abstraction layer (SAL) error handler attempts to correct the error in response to the error log entry and the saved state information, if the SAL error handler fails to correct the error, an operating system (OS) error handler attempts to correct the error in response to the error log entry and the saved state information;
the OS error handler terminates a software process if the error is severe.
-
-
40. A method, comprising:
-
executing error handlers in various ones of multiple processors to detect errors;
detecting an error in the system by one of the error handlers executing in one of the processors;
determining in the one error handler whether the error has a certain characteristic;
if the error has the characteristic, placing the system in a rendezvous state;
after the system enters the rendezvous state, performing error handling in a designated one of the processors and idling others of the processors. - View Dependent Claims (41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52)
a PAL error handler;
a SAL error handler.
-
-
45. The method of claim 44 where performing error handling in the one processor comprises executing the SAL error handler in the one processor.
-
46. The method of claim 44 where the PAL error handler includes multiple routines.
-
47. The method of claim 46 where both the PAL and the SAL error handlers access the multiple routines.
-
48. The method of claim 46 where the error handlers further comprise an operating-system (OS) error handler.
-
49. The method of claim 48 where the SAL error handler hands off to the OS error-handler if the SAL error handler fails to correct the error.
-
50. The method of claim 40 further comprising determining whether the error is global.
-
51. The method of claim 50 further comprising determining whether the error is severe, and where the certain characteristic is that the error is global and severe.
-
52. The method of claim 40 further comprising detecting multiple errors, and where the certain characteristic is that a certain number of errors occur within a fixed amount of time.
Specification