ADDING SCALABILITY AND FAULT TOLERANCE TO GENERIC FINITE STATE MACHINE FRAMEWORKS FOR USE IN AUTOMATED INCIDENT MANAGEMENT OF CLOUD COMPUTING INFRASTRUCTURES
First Claim
1. A system for executing finite state machines in a scalable and fault tolerant manner, comprising:
- one or more processors;
persistent storage for storing data associated with handling of an event associated with an information technology element;
a finite state machine engine operable to receive the event and execute a finite state machine instance representing the information technology element, and operable to log in the persistent storage, a plurality of internal actions associated with processing of the event, wherein if the finite state machine engine is stopped while in execution, the finite state machine engine continues to process the event based on the persistent data logged in persistent storage indexed by the finite state machine instance to complete the handling of the event, after the finite state machine engine is restarted.
7 Assignments
0 Petitions
Accused Products
Abstract
A scalable and fault tolerant finite state machine engine, for example, for use in an automated incident management system, logs or records data in persistent storage at different points or levels during various internal processing of an event associated with an information technology element, and action taken associated with the event, by executing a finite state machine instance that encodes policies for handling incidents on such types of information technology elements. In the event that the finite state machine engine is shutdown during processing, the finite state machine engine is able to pick up from where it left off when it was shutdown, for each abnormally terminated finite state machine instance, by using the data logged in the persistent storage and determining a point of processing from where it should continue its execution.
113 Citations
25 Claims
-
1. A system for executing finite state machines in a scalable and fault tolerant manner, comprising:
-
one or more processors; persistent storage for storing data associated with handling of an event associated with an information technology element; a finite state machine engine operable to receive the event and execute a finite state machine instance representing the information technology element, and operable to log in the persistent storage, a plurality of internal actions associated with processing of the event, wherein if the finite state machine engine is stopped while in execution, the finite state machine engine continues to process the event based on the persistent data logged in persistent storage indexed by the finite state machine instance to complete the handling of the event, after the finite state machine engine is restarted. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for implementing scalable, fault-tolerant finite state machines, comprising:
-
receiving an event associated with an information technology element; identifying an entry associated with the information technology element in a finite state machine states table stored in persistent storage; updating the entry as locked in the finite state machine states table and storing the updated finite state machine states table in the persistent storage; creating a finite state machine instance to process the event and initializing it with information from the finite state machine states table; invoking an execute operation of the finite state machine instance, wherein the finite state machine instance executes state transitions associated with the information technology element, starting from the point where it finished execution when it was last invoked; logging data in the persistent storage indicating that a workflow is to be scheduled before submitting the workflow; logging data in the persistent storage indicating that the workflow is submitted after scheduling the workflow; and after the finite state machine instance finishes, updating the entry as unlocked, and also storing current state information to be used in the next execution, in the finite state machine states table. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
-
-
19. A method for executing finite state machines in a scalable and fault-tolerant manner, comprising:
-
an event queue polling thread polling an event queue in persistent storage; in response to receiving an event in the event queue, the event queue polling thread associating the event with a finite state machine instance representing an information technology infrastructure element based on information stored in a finite state machine states table; the event queue polling thread inserting a work item including an identifier associated with the information technology element, the type of element, the current state of the finite state machine and the event identifier and event name, in an in-memory thread pool queue; the thread pool further allocating a thread to run the finite state machine; the thread creating a finite state machine instance associated with the information technology element, initializing it with values set in the work item, and executing the finite state machine instance until the finite state machine quiesces; the finite state machine instance retrieving data from an event-action history table to determine one or more corrective actions to be taken associated with the event; the finite state machine instance logging the one or more corrective actions to be taken and a workflow unique identifier associated with the one or more corrective actions to be taken in a workflow unique identifier table; the finite state machine instance scheduling the one or more corrective actions with a workflow system module for executing corrective actions, the finite state machine performing a validation protocol with each scheduled workflow before permitting it to continue execution when scheduled, wherein in response to a processor executing the steps of the method being shutdown and restarted, the finite state machine can continue to process the event from a point of processing before the shutdown based on logs stored in the persistent storage during the steps.
-
-
20. A computer readable storage medium storing a program of instructions executable by a machine to perform a method for scalable, fault-tolerant finite state machine, comprising:
-
receiving an event associated with an information technology element; identifying an entry associated with the information technology element in a finite state machine states table stored in persistent storage; updating the entry as locked in the finite state machine states table and storing the updated finite state machine states table in the persistent storage; creating a finite state machine instance to process the event and initializing it with information from the finite state machine states table; invoking an execute operation of the finite state machine instance, wherein the finite state machine instance executes state transitions associated with the information technology element, starting from the point where it finished execution when it was last invoked; logging data in the persistent storage indicating that a workflow is to be scheduled before submitting the workflow; logging data in the persistent storage indicating that the workflow is submitted after scheduling the workflow; and after the finite state machine instance finishes, updating the entry as unlocked, and also storing current state information to be used in the next execution, in the finite state machine states table. - View Dependent Claims (21, 22, 23, 24, 25)
-
Specification