Method and apparatus for monitoring computer system objects to improve system reliability
First Claim
Patent Images
1. A system comprising:
- a bus;
a processor coupled to the bus; and
a memory, coupled to the bus, including a plurality of instructions to be executed by the processor, the plurality of instructions including, an object to generate a registration request, including a registration type identifying a way failure of the object is to be determined and a recovery type identifying a recovery action to initiate in the event failure of the object is detected, and a monitor to receive the registration request, to monitor the object for failure in accordance with the registration type, and if failure of the object is detected then to initiate the recovery action in accordance with the recovery type.
2 Assignments
0 Petitions
Accused Products
Abstract
Computer system reliability is improved using various techniques to monitor objects (e.g., processes, threads, DLLs, etc.) executing on the system. Such techniques include active techniques, in which information is continually communicated from the object to the monitor, and passive techniques, in which the object does not need to repeatedly provide information to the monitor. The monitor determines when an object in the computer system has failed, and initiates appropriate recovery action when such a failure is detected.
158 Citations
63 Claims
-
1. A system comprising:
-
a bus;
a processor coupled to the bus; and
a memory, coupled to the bus, including a plurality of instructions to be executed by the processor, the plurality of instructions including, an object to generate a registration request, including a registration type identifying a way failure of the object is to be determined and a recovery type identifying a recovery action to initiate in the event failure of the object is detected, and a monitor to receive the registration request, to monitor the object for failure in accordance with the registration type, and if failure of the object is detected then to initiate the recovery action in accordance with the recovery type. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method in a system, the method comprising:
-
receiving a registration request from an object, the registration request including an indication of a way failure of the object is to be determined and a type of recovery to be attempted in the event of a detected failure of the object;
detecting, in accordance with the indication of the way failure of the object is to be determined, whether the object has failed; and
initiating recovery of the object in accordance with the indication of the type of recovery to be attempted in response to detecting that the object has failed. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
the receiving comprises receiving the registration request from a first thread associated with the object;
the detecting comprises detecting the object has failed if a notification is not received from a second thread associated within the object within the specified amount of time; and
the second thread is different from the first thread.
-
-
14. A method as recited in claim 8, wherein the detecting comprises:
-
repeatedly checking whether the object is still running in the system; and
determining the object has failed if the object is no longer present running in the system.
-
-
15. A method as recited in claim 8, wherein the initiating recovery of the object comprises logging the detected failure.
-
16. A method as recited in claim 8, wherein the initiating recovery of the object comprises terminating the object and restarting the object.
-
17. A method as recited in claim 16, wherein the restarting the object comprises restarting a process corresponding to the object.
-
18. A method as recited in claim 16, wherein the registration request further includes a command line, and wherein the restarting the object comprises executing the command line.
-
19. A method as recited in claim 8, wherein the initiating recovery of the object comprises rebooting the system.
-
20. A method as recited in claim 8, wherein the initiating recovery of the object comprises rebooting the system after a subsequent event occurs.
-
21. A method as recited in claim 20, wherein the subsequent event comprises an ignition coupled to the system being turned off.
-
22. A method as recited in claim 20, wherein the subsequent event comprises activation of a power off button.
-
23. A method as recited in claim 8, further comprising sending, prior to the initiating recovery of the object, a message to the object to verify that the object is to be recovered.
-
24. A method as recited in claim 23, further comprising aborting the recovery in response to an indication being received from the object that the object is not to be recovered.
-
25. At least one computer-readable memory containing a computer program that is executable by a processor to perform the method recited in claim 8.
-
26. An automobile computer programmed to perform the method as recited in claim 8.
-
27. A method in an object, the method comprising
generating a registration request, the registration request including an indication of a way failure of the object is to be determined and a type of recovery to be attempted in the event of a detected failure of the object; - and
transmitting the registration request to a monitor. - View Dependent Claims (28, 29, 30, 31, 32)
receiving an indication from the monitor that the object was detected as having failed; and
providing a response to the monitor indicating whether recovery of the object is refused.
- and
-
31. A method as recited in claim 27, further comprising generating notification messages and repeatedly sending the notification messages to the monitor within a specified amount of time after the previous notification message was sent.
-
32. At least one computer-readable memory containing a computer program that is executable by a processor to perform the method recited in claim 27.
-
33. A system comprising:
-
a plurality of objects running in the system; and
a monitor to, monitor the plurality of objects, detect when at least one of the plurality of objects has failed, and initiate an appropriate recovery action when an object fails, wherein a type of the recovery action is previously identified by the object. - View Dependent Claims (34, 36, 37, 38, 39, 40, 41, 42)
-
-
35. A system comprising:
-
a plurality of objects running in the system; and
a monitor to, passively monitor the plurality of objects by repeatedly comparing a list of currently running objects to a list of objects that should be running, detect when at least one of the plurality of objects has failed, and initiate an appropriate recovery action when an object fails.
-
-
43. A method comprising:
-
determining that an object in a system has failed in response to not receiving a notification from the object within a specified amount of time;
checking whether a test object is being scheduled to execute by a system scheduler; and
initiating recovery of the object if the test object is being scheduled, otherwise reversing the determination that the object has failed if the test object is not being scheduled. - View Dependent Claims (44, 45, 46, 47)
-
-
48. A computer-readable storage medium comprising computer-executable instructions that implement interface methods, the interface methods performing a function comprising:
recording a way failure of an object is to be determined and a type of recovery to be attempted in the event of a detected failure of the object, wherein the way failure of the object is to be determined is identified by the object. - View Dependent Claims (49, 50, 51)
-
52. A computer-readable storage medium comprising computer-executable instructions that implement interface methods, the interface methods performing a function comprising:
-
recording a way failure of an object is to be determined and a type of recovery to be attempted in the event of a detected failure of the object; and
further performing a function comprising informing the object that a recovery process has been initiated for the object and providing the object with an opportunity to abort the recovery process.
-
-
53. A method comprising:
recording a way failure of an object is to be determined and a type of recovery to be attempted in the event of a detected failure of the object, wherein the type of recovery to be attempted is identified by the object. - View Dependent Claims (54, 55, 56)
-
57. A system comprising:
-
a database; and
a monitor configured to record, in the database, a way failure of an object is to be determined and a type of recovery to be attempted in the event of a detected failure of the object, wherein the way failure of the object is to be determined and the type of recovery to be attempted are both identified by the object. - View Dependent Claims (58, 59)
-
-
60. A system comprising:
-
a processor;
a memory, coupled to the processor, including a plurality of instructions to be executed by the processor, the plurality of instructions including, a monitor to determine that an object in a system has failed in response to not receiving a notification from the object within a specified amount of time, to check whether a test object is being scheduled to execute by a system scheduler, to initiate recovery of the object if the test object is being scheduled, and to reverse the determination that the object has failed if the test object is not being scheduled. - View Dependent Claims (61, 62, 63)
-
Specification