Identifying likely failure points in a digital data processing system
First Claim
1. A method of detecting one of a plurality of likely failures of components in a digital data processing system, comprising the steps ofstoring a plurality of error entries, each error entry containing a plurality of indicia pertaining to an error event in said digital data processing system,analyzing, through use of a digital data processing system, said error entries containing indicia pertaining to error events, to identify a pattern of differing indicia pertaining to said error events that corresponds with one of a plurality of failure theories,and, based on said failure theory, identifying a said likely failure of a said component and initiating recovery operations to avoid loss of data.
2 Assignments
0 Petitions
Accused Products
Abstract
An expert system for determining the likelihood of failure of a unit in a computer system. The operating system of the computer system maintains a log of the errors occurring for each unit in the computer system. If a predetermined number of errors have been entered in the log for a specific unit, the expert system retrieves the error entries relating to that unit and processes them to determine whether a failure is likely to occur. In this, the processing performed by the expert system is arranged so that tests relating to components of increasing particularity, and decreasing generality, are performed after the tests relating to more general components.
-
Citations
20 Claims
-
1. A method of detecting one of a plurality of likely failures of components in a digital data processing system, comprising the steps of
storing a plurality of error entries, each error entry containing a plurality of indicia pertaining to an error event in said digital data processing system, analyzing, through use of a digital data processing system, said error entries containing indicia pertaining to error events, to identify a pattern of differing indicia pertaining to said error events that corresponds with one of a plurality of failure theories, and, based on said failure theory, identifying a said likely failure of a said component and initiating recovery operations to avoid loss of data.
-
11. A system for detecting one of a plurality of likely failures of components in a digital data processing system, comprising
a collector module means for collecting a plurality of stored error entries, each error entry containing a plurality of indicia pertaining to an error event in said digital data processing system, an analyzer module means for analyzing said error entries containing indicia pertaining to error events, identifying a pattern of differing indicia pertaining to said error events that corresponds with one of a plurality of failure theories, and, based on said failure theory, identifying a said likely failure of a said component, and a recovery module means for initiating recovery operations, based on said failure theory, to avoid loss of data, said collector module means, said analyzer module means, and said recovery module means being adapted for implementation by a digital data processing system.
Specification