Detection, remediation and inference rule development for multi-layer information technology (“IT”) structures
First Claim
1. An apparatus for detection, remediation and inference rule development for multi-layer information technology (“
- IT”
) structures, wherein layers in the multi-layer IT structure consists of a network layer, a virtual layer, an operating system layer, a middle ware (“
MW”
) layer and a data layer, the apparatus comprising;
an event generator that monitors for, retrieves, and pools a plurality of error events and a plurality of performance events from a collection of alerting sources, said alerting sources for providing alerts from the network layer, the virtual layer, the operating system layer, the MW layer and the data layer;
an event parser that provides a system status; and
an analytics engine that detects a plurality of patterns and a plurality of relationships in the retrieved error events, the patterns including an identification of a specific one of the network layer, the virtual layer, the operating system layer, the MW layer and the data layer, within which the error occurred, and an identification of performance events and system status, and an identification of model event hierarchies based on the detected patterns and relationships, wherein said detecting a plurality of patterns comprises indexing the error events into one of the memory-related issues, DSN-related issues, resource-related issues attributed to infrastructure resources and “
String not found”
related issues that were followed by application failure, said indexing further comprising self-constructively cataloguing the error events based on the indexing;
wherein said analytics engine is configured to invoke one or more auto-remediation processes in response to pre-determined indexed error events, said one or more auto-remediation processes based, at least in part, on the detected patterns, relationships and hierarchies, and wherein said analytics engine is for detecting a threshold number of occurrences relating to a sequence of a pre-determined number of memory errors over a pre-determined period of time;
wherein the analytics engine invokes at least one auto-remediation process in response to the detection of the pre-determined number of memory errors over a pre-determined period of time, said auto-remediation process comprising at least one of restart, bounce service, kill present application and exit current routine.
1 Assignment
0 Petitions
Accused Products
Abstract
An apparatus for detection, remediation and inference rule development for multi-layer information technology IT structures is provided. Certain embodiments of the apparatus may include an event generator. The event generator may monitor for, retrieve, and pool error events and performance events from alerting sources. The alerting sources may provide event information from one more of multiple layers. The apparatus may also include an event parser that provides a system status. The apparatus may include an analytics engine that detects patterns and relationships in the retrieved error events, performance events and system status, and models event hierarchies based on the detected patterns and relationships. The analytics engine may invoke auto-remediation processes in response to pre-determined error events. In some embodiments, the engine may detect a pre-determined number of resource-related events. Based on the detecting, the analytics engine may attribute the resource-related events to infrastructure resources.
-
Citations
23 Claims
-
1. An apparatus for detection, remediation and inference rule development for multi-layer information technology (“
- IT”
) structures, wherein layers in the multi-layer IT structure consists of a network layer, a virtual layer, an operating system layer, a middle ware (“
MW”
) layer and a data layer, the apparatus comprising;an event generator that monitors for, retrieves, and pools a plurality of error events and a plurality of performance events from a collection of alerting sources, said alerting sources for providing alerts from the network layer, the virtual layer, the operating system layer, the MW layer and the data layer; an event parser that provides a system status; and an analytics engine that detects a plurality of patterns and a plurality of relationships in the retrieved error events, the patterns including an identification of a specific one of the network layer, the virtual layer, the operating system layer, the MW layer and the data layer, within which the error occurred, and an identification of performance events and system status, and an identification of model event hierarchies based on the detected patterns and relationships, wherein said detecting a plurality of patterns comprises indexing the error events into one of the memory-related issues, DSN-related issues, resource-related issues attributed to infrastructure resources and “
String not found”
related issues that were followed by application failure, said indexing further comprising self-constructively cataloguing the error events based on the indexing;
wherein said analytics engine is configured to invoke one or more auto-remediation processes in response to pre-determined indexed error events, said one or more auto-remediation processes based, at least in part, on the detected patterns, relationships and hierarchies, and wherein said analytics engine is for detecting a threshold number of occurrences relating to a sequence of a pre-determined number of memory errors over a pre-determined period of time;wherein the analytics engine invokes at least one auto-remediation process in response to the detection of the pre-determined number of memory errors over a pre-determined period of time, said auto-remediation process comprising at least one of restart, bounce service, kill present application and exit current routine. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- IT”
-
9. A method for detection, remediation and inference rule development for multi-layer information technology (“
- IT”
) structures, wherein layers in the multi-layer IT structure consist of a network layer, a virtual layer, an operating system layer, a middle ware (“
MW”
) layer and a data layer, the method comprising;monitoring, retrieving, and pooling error events and performance events from a collection of alerting sources, said alerting sources for providing alerts from the network layer, the virtual layer, the operating system layer, the MW layer and the data layer; parsing the error events and performance events to provide a current system status; and detecting patterns and relationships in the retrieved error events, performance events and system status, the patterns including an identification of which specific one of the network layer, the virtual layer, the operating system layer, the MW layer and the data layer, the error occurred within;
wherein said detecting comprises indexing the error events into one of the memory-related issues, DSN-related issues, resource-related issues attributed to infrastructure resources and “
String not found”
related issues that were followed by application failure, said indexing further comprising self-constructively cataloguing the error events based on the indexing;modeling event hierarchies based on the detected patterns and relationships; and invoking at least one auto-remediation process in response to pre-determined indexed error events, said auto-remediation process based, at least in part, on the detected patterns, relationships and hierarchies, and wherein said auto-remediation process detects a predetermined number of occurrences of “
string not found”
over a pre-determined period and also detects an outage following every occurrence of “
string not found”
over the pre-determined time period;inferring a rule for the occurrence of the outage following the occurrence of “
string not found”
. - View Dependent Claims (10, 11, 12, 13, 14, 15)
- IT”
-
16. An apparatus for detection, remediation and inference rule development for multi-layer information technology (“
- IT”
) structures, wherein layers in the multi-layer IT structure comprise a network layer, a virtual layer, an operating system layer, a middle ware (“
MW”
) layer and a data layer, the apparatus comprising;an event generator that monitors for, retrieves, and pools error events and performance events from a collection of alerting sources, said alerting sources for providing alerts from the network layer, the virtual layer, the operating system layer, the MW layer and the data layer; an event parser that provides a system status; and an analytics engine that detects patterns and relationships in the retrieved error events, performance events and system status, and models event hierarchies based on the detected patterns and relationships, the patterns including an identification of a specific one of the network layer, the virtual layer, the operating system layer, the MW layer and the data layer, within which the error occurred; wherein said analytics engine is configured to invoke auto-remediation processes in response to pre-determined indexed error events, said auto-remediation processes based, at least in part on the detected patterns, relationships and hierarchies, and wherein said engine is for detecting a pre-determined number of resource related events over a pre-determined amount of time and, based on the detecting a pre-determined number of resource-related events over a pre-determined amount of time, the analytics engine attributes the resource-related events to infrastructure resources. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23)
- IT”
Specification