Failure recognition, notification, and prevention for learning and self-healing capabilities in a monitored system
First Claim
Patent Images
1. A method for failure recognition, comprising:
- monitoring a system to collect monitoring data;
detecting a failure of the system, wherein the failure includes an abnormal system state;
identifying a failure point for the detected failure in a data space defined by the monitoring data;
associating at least one predefined action with the identified failure point;
identifying values of the monitoring data in a neighborhood of the identified failure point;
dividing the neighborhood of the identified failure point into pre-failure data and post-failure data, wherein the pre-failure data and post-failure data identify pre-failure and post-failure states of the system, respectively;
dividing the pre-failure data into a plurality of ranges and associating a unique predefined action with each of said ranges;
determining if a state of the system reaches a point within a space defined by one of the plurality of ranges;
in response to said state of the system reaching a point within a space defined by one of the plurality of ranges, performing said unique predefined action associated with said one of the plurality of ranges;
wherein each of the unique predefined actions comprises an automatic self-healing preventive action and at least one of the unique predefined actions further comprises a notification, wherein the notification comprises an indicator of system unavailability and a period of unavailability; and
outputting a representation of the performed predefined action.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention provides failure recognition, notification, and prevention for learning and self-healing capabilities in a monitored system. A system to collect monitoring data is monitored. A failure of the system is detected; A failure point for the detected failure in a data space defined by the monitoring data is identified and at least one predefined action with the identified failure point is associated. This process is repeated for a plurality of system failures. When a state of the system is determined to be approaching an identified failure point, the at least one predefined action associated with that identified failure point is performed.
35 Citations
7 Claims
-
1. A method for failure recognition, comprising:
-
monitoring a system to collect monitoring data; detecting a failure of the system, wherein the failure includes an abnormal system state; identifying a failure point for the detected failure in a data space defined by the monitoring data; associating at least one predefined action with the identified failure point; identifying values of the monitoring data in a neighborhood of the identified failure point; dividing the neighborhood of the identified failure point into pre-failure data and post-failure data, wherein the pre-failure data and post-failure data identify pre-failure and post-failure states of the system, respectively; dividing the pre-failure data into a plurality of ranges and associating a unique predefined action with each of said ranges; determining if a state of the system reaches a point within a space defined by one of the plurality of ranges; in response to said state of the system reaching a point within a space defined by one of the plurality of ranges, performing said unique predefined action associated with said one of the plurality of ranges;
wherein each of the unique predefined actions comprises an automatic self-healing preventive action and at least one of the unique predefined actions further comprises a notification, wherein the notification comprises an indicator of system unavailability and a period of unavailability; andoutputting a representation of the performed predefined action. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
Specification