SYSTEM AND METHOD FOR USING FAILURE CASTING TO MANAGE FAILURES IN COMPUTER SYSTEMS
First Claim
1. A system for managing failures in a computer system using failure casting comprising:
- a system manager that performs actions on a computer system to address failures that occur within the computer system;
a failure casting logic that detects failures as they occur in the computer system; and
a failure casting hierarchy that defines a plurality of the failures that can occur within the computer system, and which is used by the failure casting logic upon detecting the occurrence of a failure to cast the failure from a first failure type to a second failure type, wherein the second failure type is then communicated to the system manager to allow the system manager to treat the failure as if it were the second failure type.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method for using failure casting to manage failures in computer system. In accordance with an embodiment, the system uses a failure casting hierarchy to cast failures of one type into failures of another type. In doing this, the system allows incidents, problems, or failures to be cast into a (typically smaller) set of failures, which the system knows how to handle. In accordance with a particular embodiment, failures can be cast into a category that is considered reboot-curable. If a failure is reboot-curable then rebooting the system will likely cure the problem. Examples include hardware failures, and reboot-specific methods that can be applied to disk failures and to failures within clusters of databases. The system can even be used to handle failures that were hitherto unforeseen—failures can be cast into known failures based on the failure symptoms, rather than any underlying cause.
-
Citations
36 Claims
-
1. A system for managing failures in a computer system using failure casting comprising:
-
a system manager that performs actions on a computer system to address failures that occur within the computer system; a failure casting logic that detects failures as they occur in the computer system; and a failure casting hierarchy that defines a plurality of the failures that can occur within the computer system, and which is used by the failure casting logic upon detecting the occurrence of a failure to cast the failure from a first failure type to a second failure type, wherein the second failure type is then communicated to the system manager to allow the system manager to treat the failure as if it were the second failure type. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for managing failures in a computer system using failure casting comprising the steps of:
-
detecting the occurrence of failures in the computer system; referring to a failure casting hierarchy that defines a plurality of failures that can occur within the computer system; using the failure casting hierarchy to cast the failure from a first failure type to a second failure type; communicating the second failure type to a system manager; and performing an action by the system manager on the computer system to address the failure including treating the failure as if it were the second failure type. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A system for managing failures in a computer system using failure casting comprising:
-
a system manager that performs actions on a computer system to address failures that occur within the computer system; and a script that detects the occurrence of a failure in the computer system and then uses a failure casting hierarchy within the script to cast the failure from a first failure type to a second failure type, wherein the second failure type is then communicated to the system manager to allow the system manager to treat the failure as if it was the second failure type. - View Dependent Claims (22, 23, 24)
-
-
25. A method for managing failures in a computer system using failure casting comprising the steps of:
-
providing a system manager that performs actions on a computer system to address failures that occur within the computer system; and executing a script that detects the occurrence of a failure in the computer system and then uses a failure casting hierarchy within the script to cast the failure from a first failure type to a second failure type, wherein the second failure type is then communicated to the system manager to allow the system manager to treat the failure as if it was the second failure type. - View Dependent Claims (26, 27, 28)
-
-
29. A system readable medium, including instructions stored thereon, which when executed by a system cause the system to perform the steps of:
-
detecting the occurrence of failures in the system; using a failure casting hierarchy to cast the failure from a first failure type to a second failure type, wherein the failure casting hierarchy defines a plurality of failures that can occur within the system; performing an action on the system to address the failure, including treating the failure as if it were the second failure type. - View Dependent Claims (30, 31, 32)
-
-
33. A system readable medium, including instructions stored thereon, which when executed by a system cause the system to perform the steps of:
-
executing a script that detects the occurrence of a failure in the computer system; and using a failure casting hierarchy within the script to cast the failure from a first failure type to a second failure type, wherein the second failure type is then communicated to the system manager to allow the system manager to treat the failure as if it was the second failure type. - View Dependent Claims (34, 35, 36)
-
Specification