Method and apparatus for locating a faulty device in a computer system
First Claim
1. A computer system comprising:
- a plurality of devices,a plurality of device drivers, each device driver operable to monitor an operational status of one of said plurality of devices, anda fault response processor operable to generate a model which represents the monitored devices of the computer system and an inter-connection of said monitored devices,wherein said device driver for each of said monitored devices further being operable, consequent upon a change of operational status of said monitored device, to generate fault report data including the operational status of the monitored device and a fault indication of whether the change of operational status of the monitored device was caused internally within the monitored device or externally by another connected device,wherein said fault response processor is operable, consequent upon receipt of said fault report data from said device drivers, to estimate a location of a faulty device by applying the operational status of one or more of the monitored devices and the fault indication corresponding to one or more of the monitored devices to said model,wherein said fault response processor is operable to pre-process said model by comparing the operational status information from fault report data associated with successively connected devices in a data path.
2 Assignments
0 Petitions
Accused Products
Abstract
A computer system compnses a processor (2), memory (4) and a plurality of devices (6, 8, 12), the processor (2) and the memory (4) being operable to effect the operation of a fault response processor (AFR), and a device driver (GRAPHICS, NETWORK, H2IO, IO2L, SERIAL) for each of the devices. The fault response processor (AFR) is operable to generate a model which represents the processor (2), the memory (4) and the devices (6, 8, 12) of the computer system and the inter-connection of the processor (2), memory (4) and the devices (GRAPHICS, NETWORK, H2IO, IO2L, SERIAL). The device driver (GRAPHICS, NETWORK, H2IO, IO2L, SERIAL) for each of the devices (6, 8, 12) is arranged, consequent upon a change of operational status of the device, to generate fault report data indicating whether the change of status was caused internally within the device or externally by another connected device. The devices of the computer system may be formed as a plurality of Field Replaceable Units (FRU). The fault response processor (AFR) is operable, consequent upon receipt of the fault reports from the device drivers (GRAPHICS, NETWORK, H2IO, IO2L, SERIAL) to estimate the location of a FRU containing a faulty device by applying the fault indication to the model. In other embodiments the fault report data includes direction information indicating a connection between the device and the other connected device which caused the external fault. Having identified the faulty device the FRU may be replaced, thereby minimizing down time of the computer system.
-
Citations
47 Claims
-
1. A computer system comprising:
-
a plurality of devices, a plurality of device drivers, each device driver operable to monitor an operational status of one of said plurality of devices, and a fault response processor operable to generate a model which represents the monitored devices of the computer system and an inter-connection of said monitored devices, wherein said device driver for each of said monitored devices further being operable, consequent upon a change of operational status of said monitored device, to generate fault report data including the operational status of the monitored device and a fault indication of whether the change of operational status of the monitored device was caused internally within the monitored device or externally by another connected device, wherein said fault response processor is operable, consequent upon receipt of said fault report data from said device drivers, to estimate a location of a faulty device by applying the operational status of one or more of the monitored devices and the fault indication corresponding to one or more of the monitored devices to said model, wherein said fault response processor is operable to pre-process said model by comparing the operational status information from fault report data associated with successively connected devices in a data path. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A fault response processor for use in estimating a location of at least one of a plurality of devices of a system which is faulty, said fault response processor being operable to:
-
generate a data model having a structure which represents said plurality of devices and the inter-connection of said devices, receive fault report data generated by device drivers following a change in the operational status of one or more of the devices, wherein said fault report data including the operational status of the device and a fault indication of whether the change in the operational status was caused internally within the device or externally by another connected device, pre-process said model by comparing the operational status information from fault report data associated with successively connected devices in a data path, and estimate a location of a faulty device, within said model, by applying the operational status of one or more of the devices and the fault indication corresponding to one or more of the devices to the model. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
26. A method of locating faulty devices in a system including a plurality of devices, said method comprising:
-
monitoring an operational status of one or more of the plurality of devices; generating a model of said system, wherein the model includes a structure representing the plurality of monitored devices and the inter-connection of the monitored devices via at least one data path; generating fault report data consequent upon a change of operational status of at least one of said monitored devices, wherein said fault report data including the operational status of the monitored device and a fault indication of whether the change of operational status of the monitored device was caused internally within the monitored device or externally by another connected device; pre-processing said model by comparing the operational status information from fault report data associated with successively connected devices in a data path; estimating a location of a faulty device, within said model, by applying the operational status of one or more of the monitored devices and the fault indication corresponding to one or more of the monitored devices to the model. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38)
-
-
39. A computer readable storage medium comprising program instructions, wherein the program instructions are executable by a processor to:
-
monitor an operational status of a plurality of devices; generate a model of a system, wherein the model includes a structure representing the plurality of monitored devices included in the system and the inter-connection of the monitored devices via at least one data path; generate fault report data consequent upon a change of operational status of at least one of said monitored devices, wherein said fault report data including the operational status of the monitored device and a fault indication of whether the change of operational status of the monitored device was caused internally within the monitored device or externally by another connected device; pre-process said model by comparing the operational status information from fault report data associated with successively connected devices in a data path; and estimate a location of a faulty device, within said model, by applying the operational status of one or more of the monitored devices and the fault indication corresponding to one or more of the monitored devices to the model.
-
-
40. A computer system comprising:
-
a plurality of devices; a plurality of device drivers, each device driver operable to monitor an operational status of one of said plurality of devices; and a fault response processor operable to generate a model which represents the monitored devices of the computer system and an inter-connection of said monitored devices; wherein said device driver for each of said monitored devices being further operable, consequent upon a change of operational status of said monitored device, to generate fault report data including the operational status of the monitored device and a fault indication of whether the change of operational status was caused internally within the monitored device or externally by another connected device; wherein said fault response processor is operable, consequent upon receipt of said fault report data from said device drivers, to estimate a location of a faulty device by applying the operational status of one or more of the monitored devices and the fault indication corresponding to one or more of the monitored devices to said model; wherein said fault response processor is operable to pre-process said model by comparing the operational status information from fault report data associated with successively connected devices in a data path, wherein if the operational status for a preceding device on the data path has changed, fault direction information is generated for the preceding device indicating that a fault is internal, and wherein if the operational status for a succeeding device on the data path has changed, fault direction information is generated for the succeeding device indicating that a fault is external, wherein the fault report data associated with said succeeding device is disregarded in said estimation of the location of said faulty device.
-
-
41. A computer system comprising:
-
a plurality of devices; a plurality of device drivers, each device driver operable to monitor an operational status of one of said plurality of devices; and a fault response processor operable to generate a model which represents the monitored devices of the computer system and an inter-connection of said monitored devices; wherein said device driver for each of said monitored devices being further operable, consequent upon a change of operational status of said monitored device, to generate fault report data including the operational status of the monitored device and a fault indication of whether the change of operational status was caused internally within the monitored device or externally by another connected device; wherein said fault response processor is operable, consequent upon receipt of said fault report data from said device drivers, to estimate a location of a faulty device by applying the operational status of one or more of the monitored devices and the fault indication corresponding to one or more of the monitored devices to said model; wherein said fault response processor is further operable to generate fault probability measures for one or more monitored devices in the model, wherein each fault probability measure is representative of a perceived likelihood that the monitored device is faulty, wherein the fault probability measures being generated by applying fault direction information and the operational status information to the model, wherein said fault response processor is operable to compare the fault probability measures for the monitored devices in the model with a predetermined threshold, and consequent upon the comparison, to estimate the location of the faulty device from the result of the comparison; wherein, for each monitored device represented in said model having a plurality of fault probability measures associated with the monitored device, said fault response processor is operable to combine the fault probability measures for the monitored device, wherein the combined fault probability measure being compared with said predetermined threshold to provide an estimated location of the faulty device. - View Dependent Claims (42)
-
-
43. A fault response processor for use in estimating a location of at least one of a plurality of devices of a system which is faulty, said fault response processor being operable to:
-
generate a data model having a structure which represents said plurality of devices and the inter-connection of said devices; receive fault report data generated by device drivers following a change in the operational status of one or more of the devices, wherein said fault report data including the operational status of the device and a fault indication of whether the change in the operational status was caused internally within the device or externally by another connected device; and estimate a location of a faulty device, within said model, by applying the operational status of one or more of the devices and the fault indication corresponding to one or more of the devices to the model; wherein said fault response processor is operable to pre-process said model by comparing the operational status information from fault report data associated with successively connected devices in a data path, wherein if the operational status indicates that a preceding device on the data path is degraded or down, fault direction information is generated for the preceding device indicating that a fault is internal, and wherein if the operational status indicates that a succeeding device on the data path is down or degraded, fault direction information is generated for the succeeding device indicating that a fault is external, wherein the fault report data associated with the succeeding device is disregarded in said estimation of the location of said faulty device.
-
-
44. A fault response processor for use in estimating a location of at least one of a plurality of devices of a system which is faulty, said fault response processor being operable to:
-
generate a data model having a structure which represents said plurality of devices and the inter-connection of said devices; receive fault report data generated by device drivers following a change in the operational status of one or more of the devices, wherein said fault report data including the operational status of the device and a fault indication of whether the change in the operational status was caused internally within the device or externally by another connected device; estimate a location of a faulty device, within said model, by applying the operational status of one or more of the devices and the fault indication corresponding to one or more of the devices to the model; generate fault probability measures for one or more monitored devices in the model, wherein each fault probability measure is representative of a perceived likelihood that the monitored device is faulty, wherein the fault probability measures being generated by applying fault direction information and the operational status information to the model; compare the fault probability measures for the monitored devices in the model with a predetermined threshold, and consequent upon the comparison, to estimate the location of the faulty device from the result of the comparison; and for each monitored device represented in said model having a plurality of fault probability measures associated with the monitored device, combine the fault probability measures for the monitored device, and compare the combined fault probability measure with said predetermined threshold to provide an estimated location of the faulty device. - View Dependent Claims (45)
-
-
46. A method of locating faulty devices in a system including a plurality of devices, said method comprising:
-
monitoring an operational status of one or more of the plurality of devices; generating a model of said system, wherein the model includes a structure representing the plurality of monitored devices and the inter-connection of the monitored devices via at least one data path; generating fault report data consequent upon a change of operational status of at least one of said devices, wherein said fault report data including the operational status of the monitored device and a fault indication of whether the change of operational status of the monitored device was caused internally within the monitored device or externally by another connected device; and estimating a location of a faulty device, within said model, by applying the operational status of one or more of the monitored devices and the fault indication corresponding to one or more of the monitored devices to the model; generating fault probability measures for one or more monitored devices in the model, wherein each fault probability measure is representative of a perceived likelihood that the monitored device is faulty, wherein the fault probability measures being generated by applying fault direction information and the operational status information to the model; comparing the fault probability measures for the monitored devices in the model with a predetermined threshold, and consequent upon the comparison, to estimate the location of the faulty device from the result of the comparison; and for each monitored device represented in said model having a plurality of fault probability measures associated with the monitored device, combining the fault probability measures for the monitored device, and then comparing the combined fault probability measure with said predetermined threshold to provide an estimated location of the faulty device. - View Dependent Claims (47)
-
Specification