Error reporting to diagnostic engines based on their diagnostic capabilities
First Claim
1. A computer network system having a fault management architecture configured for use in a computer network system, the computer network system comprising:
- a plurality of nodes interconnected in a network; and
a fault manager mounted at a node on the network and configured to diagnose and resolve faults occurring at said node, wherein the fault manager is suitable for interfacing with diagnostic engines and fault correction agents, the fault manager being suitable for receiving error information and passing this information to diagnostic engines that have subscribed to receive the error information, and wherein the fault manager publishes error reports; and
wherein each diagnostic engine subscribes to selected error reports associated with fault diagnosis capabilities of said diagnostic engine so that when the fault manager publishes error reports only subscribing diagnostic engines receive the selected error reports;
at least one diagnostic engine for receiving error information and identifying a set of fault possibilities associated with errors contained in the error information;
at least one fault correction agent for receiving the set of fault possibilities from the at least one diagnostic engine and then selecting a diagnosed fault, and then taking appropriate fault resolution action concerning the selected diagnosed fault; and
logs for tracking a status of the error information, a status of the fault management exercises, and a fault status of the resources of the computer system.
2 Assignments
0 Petitions
Accused Products
Abstract
A method, apparatus, and computer program product diagnosing and resolving faults is disclosed. A disclosed fault management architecture includes a fault manager suitable having diagnostic engines and fault correction agents. The diagnostic engines receive error information and identify associated fault possibilities. The fault possibility information is passed to fault correction agents, which diagnose and resolve the associated faults. The architecture uses logs to track the status of error information, the status of fault management exercises, and the fault status of system resources. Additionally, a soft error rate discriminator can be employed to track and resolve soft (correctible) errors in the system. The architecture is extensible allowing additional diagnostic engines and agents to be plugged in to the architecture without interrupting the normal operational flow of the computer system.
83 Citations
24 Claims
-
1. A computer network system having a fault management architecture configured for use in a computer network system, the computer network system comprising:
-
a plurality of nodes interconnected in a network; and a fault manager mounted at a node on the network and configured to diagnose and resolve faults occurring at said node, wherein the fault manager is suitable for interfacing with diagnostic engines and fault correction agents, the fault manager being suitable for receiving error information and passing this information to diagnostic engines that have subscribed to receive the error information, and wherein the fault manager publishes error reports; and
wherein each diagnostic engine subscribes to selected error reports associated with fault diagnosis capabilities of said diagnostic engine so that when the fault manager publishes error reports only subscribing diagnostic engines receive the selected error reports;at least one diagnostic engine for receiving error information and identifying a set of fault possibilities associated with errors contained in the error information; at least one fault correction agent for receiving the set of fault possibilities from the at least one diagnostic engine and then selecting a diagnosed fault, and then taking appropriate fault resolution action concerning the selected diagnosed fault; and
logs for tracking a status of the error information, a status of the fault management exercises, and a fault status of the resources of the computer system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
Specification