Triaging computing systems
First Claim
1. A method of operating a server cluster of the type including a plurality of linked servers each running a plurality of processes, the method comprising:
- detecting at least one failed process;
automatically transmitting an electronic alert message embodying a first error code indicative of the failed process to a unified triage module including a processor and an updatable index table, wherein the alert message identifies the at least one failed process and the linked server running the at least one failed process;
applying, by the processor, the first error code to the index table;
if a matching error code corresponding to the first error code is found in the index table, retrieving a solution code from the index table associated with the matching error code; and
automatically restarting the failed process using the solution code without human intervention.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods and systems are provided for automatically triaging a server cluster of the type including a plurality of linked servers each running a plurality of processes. The method includes: detecting at least one failed process; automatically transmitting an electronic alert message embodying a first error code indicative of the failed process to a unified triage module including a processor and an updatable index table; applying, by the processor, the first error code to the index table. If a matching error code corresponding to the first error code is found in the index table, retrieving a solution code from the index table associated with the matching error code and automatically restarting the failed process using the solution code without human intervention.
-
Citations
20 Claims
-
1. A method of operating a server cluster of the type including a plurality of linked servers each running a plurality of processes, the method comprising:
-
detecting at least one failed process; automatically transmitting an electronic alert message embodying a first error code indicative of the failed process to a unified triage module including a processor and an updatable index table, wherein the alert message identifies the at least one failed process and the linked server running the at least one failed process; applying, by the processor, the first error code to the index table; if a matching error code corresponding to the first error code is found in the index table, retrieving a solution code from the index table associated with the matching error code; and automatically restarting the failed process using the solution code without human intervention. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 19)
-
-
13. A processing system for triaging failures in an on-demand computing environment, comprising:
-
a database system configured to run a plurality of storage processes and to record associated log data; a unified triage (UT) module that includes an index table and a set of proposed solutions to one or more failed storage processes, the index table configured to identify the one or more proposed solutions; a monitoring module configured to listen to the database system and to detect a failed storage process, the monitoring module further configured to transmit a corresponding alert to the UT module when a failed storage process is detected; and an analytics module connected to the database system and configured to generate a log file based on the log data, wherein the log file comprises operational data temporally coincident with the failed storage process, and to transmit the log file to the UT module upon receipt by the UT module of the alert; wherein the UT module is configured to retrieve a proposed solution when it receives the alert from the monitoring module, wherein the proposed solution is retrieved based on data stored in the log file and by using solution from the index table to access a corresponding solution. - View Dependent Claims (14, 15, 16, 17, 18)
-
-
20. A non-transitory computer readable medium comprising computer readable instructions that, when executed by a processor, perform the steps comprising:
-
detecting a failed process in a server cluster of the type including a plurality of linked servers; automatically transmitting an electronic alert message and a log file each corresponding to the failed process to a unified triage module, wherein the electronic alert message identifies the failed process and the linked server running the failed process; searching an index table for a solution to the failed process; if a solution is found in the index table, automatically restarting the failed process using the solution; if a solution is found in the index table, transmitting the alert message and the log file to a user interface for manually triaging the failed process; and updating the index table using the unified triage module to reflect the results of the manual triaging.
-
Specification