Increasing software fault tolerance by employing surprise-removal paths
First Claim
1. A system to facilitate failure recovery in a computing environment, comprising:
- a computer processor that executes the following software components;
at least one driver component that enumerates at least one layer of a driver stack;
a module associated with the driver component that facilitates re-enumeration of the driver stack upon detection of an error in the computing system;
an operating system framework that interacts with the at least one driver component to facilitate operations with hardware and software components of the computing system;
the framework, via a message protocol, initiates a surprise removal sequence which simulates in software the conditions of the module being removed from the computing system, the simulation causes the at least one driver component to create a subsequent driver stack from which to operate; and
the subsequent driver stack is created in parallel with the previous stack that encountered the error in order to attempt to resume normal operations of the at least one driver component.
2 Assignments
0 Petitions
Accused Products
Abstract
The subject invention relates to systems and methods for automatic recovery from errors in a computing environment. A system is provided to facilitate failure recovery in the computing system. The system includes at least one driver component that enumerates at least one layer of a driver stack. A module associated with the driver component requests re-enumeration of the driver stack upon detection of an error in the computing system. When an error is detected by a driver or operating system component, a protocol can be established whereby a new copy of the driver'"'"'s stack or system resources is re-enumerated in parallel to existing resources that may be in an unknown or error state. The new copy of the stack may allow the driver to become operational in lieu of the previous stack which can be reclaimed for other system uses over time.
67 Citations
19 Claims
-
1. A system to facilitate failure recovery in a computing environment, comprising:
-
a computer processor that executes the following software components; at least one driver component that enumerates at least one layer of a driver stack; a module associated with the driver component that facilitates re-enumeration of the driver stack upon detection of an error in the computing system; an operating system framework that interacts with the at least one driver component to facilitate operations with hardware and software components of the computing system; the framework, via a message protocol, initiates a surprise removal sequence which simulates in software the conditions of the module being removed from the computing system, the simulation causes the at least one driver component to create a subsequent driver stack from which to operate; and the subsequent driver stack is created in parallel with the previous stack that encountered the error in order to attempt to resume normal operations of the at least one driver component. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A system for detecting and correcting errors in a computer, the system is recorded on a computer storage medium and executed by a processor, comprising:
-
means for creating a driver stack; means for interfacing to the stack; means for detecting errors with the stack; means for re-creating a new stack upon detecting the errors; and means for initiating a surprise removal sequence which simulates in software the conditions of a module being removed from a computing system, the simulation causes the re-creation of the new stack from which to operate.
-
-
14. A method for failure recovery in computer systems, comprising:
-
creating an instance of a stack to enable driver functionality; detecting a fault with the driver functionality; automatically negotiating with a framework component when detecting the fault; automatically creating a new instance of the stack after detecting the fault in order to facilitate recovery of the driver functionality; initiating, via a message protocol, a surprise removal sequence which simulates in software the conditions of a module being removed from a computing system, the simulation causes the creation of the new instance of the stack from which to operate; and creating the new instance of the stack in parallel with the previous instance of the stack that encountered an error in order to attempt to resume normal operations of the driver functionality. - View Dependent Claims (15, 16, 17, 18, 19)
-
Specification