SYSTEM AND METHOD FOR USING FAILURE CASTING TO MANAGE FAILURES IN A COMPUTED SYSTEM
0 Assignments
0 Petitions
Accused Products
Abstract
A system and method for using failure casting to manage failures in computer system. In accordance with an embodiment, the system uses a failure casting hierarchy to cast failures of one type into failures of another type. In doing this, the system allows incidents, problems, or failures to be cast into a (typically smaller) set of failures, which the system knows how to handle. In accordance with a particular embodiment, failures can be cast into a category that is considered reboot-curable. If a failure is reboot-curable then rebooting the system will likely cure the problem. Examples include hardware failures, and reboot-specific methods that can be applied to disk failures and to failures within clusters of databases. The system can even be used to handle failures that were hitherto unforeseen failures can be cast into known failures based on the failure symptoms, rather than any underlying cause.
-
Citations
45 Claims
-
1-36. -36. (canceled)
-
37. A method of managing failures in a computing system, wherein the method is implemented at least partly by a device, and wherein the method comprises:
-
detecting a failure of a first failure type in the computing system; casting the first failure type to a second failure type, different that the first failure type, wherein the second failure type has an associated failure recovery; and attempting to resolve the first failure type by using the failure recovery associated with the second failure type.
-
-
38. The method of 37, wherein the attempting to resolve the first failure type by using the failure recovery associated with the second failure type occurs at boot and/or start-up time.
-
39. The method of 37, wherein the computing system includes an array of devices and the first and second failure types are associated with failures of the array of devices.
-
40. The method of 37, wherein the method further comprises:
- using a failure casting hierarchy in a script that includes a set of non-reboot curable failures that are checked at boot time, and if a device exhibits a failure upon bootup within the set of non-reboot-curable failures, then the disk is not added to the array of devices.
-
41. A device that includes one or more processors configured to manage failures in a computing system at least by:
-
detecting a failure of a first failure type in the computing system; casting the first failure type to a second failure type, different that the first failure type, wherein the second failure type has an associated failure recovery; and attempting to resolve the first failure type by using the failure recovery associated with the second failure type. - View Dependent Claims (42, 43, 44)
-
-
45. A non-transitory computer readable storage medium storing at least executable code for managing failures in a computing system, wherein the executable code when executed at least:
-
detects a failure of a first failure type in the computing system; casts the first failure type to a second failure type, different that the first failure type, wherein the second failure type has an associated failure recovery; and attempts to resolve the first failure type by using the failure recovery associated with the second failure type.
-
Specification