Method and apparatus for automating the root cause analysis of system failures
First Claim
Patent Images
1. A method for analyzing the root cause of system failures on one or more computers, comprising:
- generating an event when a computer system detects a system failure;
determining the cause of the system failure;
transmitting the event, including the determined cause, from the computer system to a central repository;
analyzing the system failure event in the central repository;
storing the event in a local repository located on the computer system; and
synchronizing the local repository and the central repository,wherein the synchronizing step comprises;
transmitting missing events in the central repository from the computer systems,wherein the missing events correspond to system failure events for which causes were still being determined by the computer system at a time when the central repository made a request for event information to be sent thereto, and for which the causes have subsequently been determined by the computer system.
2 Assignments
0 Petitions
Accused Products
Abstract
A method for analyzing the root cause of system failures on one or more computers. An event is generated when a computer system detects a system failure. The cause of the failure is determined. The event, including the cause is transmitted from the computer system to a central repository. And the system failure is analyzed in the central repository.
-
Citations
21 Claims
-
1. A method for analyzing the root cause of system failures on one or more computers, comprising:
-
generating an event when a computer system detects a system failure; determining the cause of the system failure; transmitting the event, including the determined cause, from the computer system to a central repository; analyzing the system failure event in the central repository; storing the event in a local repository located on the computer system; and synchronizing the local repository and the central repository, wherein the synchronizing step comprises; transmitting missing events in the central repository from the computer systems, wherein the missing events correspond to system failure events for which causes were still being determined by the computer system at a time when the central repository made a request for event information to be sent thereto, and for which the causes have subsequently been determined by the computer system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. An apparatus for analyzing the root cause of system failures on one or more computers, comprising:
-
a network; a local support computer coupled to said network; a computer system coupled to the network, the computer system programmed to monitor itself and another computer system for system failures, to determine the cause of the system failure, and to transmit system failure events to the local support computer; storing the event in a local repository located on the computer system; and synchronizing the local repository and a repository of the local support computer, wherein the synchronizing step comprises; transmitting missing events in the repository of the local support computer from the computer systems, wherein the missing events correspond to system failure events for which causes were still being determined by the computer system at a time when the repository of the local support computer made a request for event information to be sent thereto, and for which the causes have subsequently been determined by the computer system. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A means for analyzing the root cause of system failures on one or more computers, comprising:
-
a means for transmitting data from one computer to another, a local support computer coupled to the means for transmitting data, a computer system coupled to the means for transmitting data, a means for the computer system to monitor itself or another computer system for system failures and determining the causes of the failures, a means for transmitting the causes of the failures to the local support computer; a local repository located on the computer system for storing the event; and a means for synchronizing the local repository and a repository of the local support computer, wherein the synchronizing means comprises; a means for transmitting missing events in the repository of the local support computer from the computer system, wherein the missing events correspond to system failure events for which causes were still being determined by the computer system at a time when the repository of the local support computer made a request for event information to be sent thereto, and for which the causes have subsequently been determined by the computer system. - View Dependent Claims (21)
-
Specification