Anomaly classification, analytics and resolution based on annotated event logs
First Claim
1. A machine-implemented method of separately dealing with emerging and possibly not routine anomalies of a data processing system that could be of significance to continuing operations of the data processing system, the data processing system being subdivided into a plurality of sections with each section comprising intercoupled local resources including one or more local data processing units and one or more local data storage units, wherein two or more of the plural sections each respectively includes a respective section behaviors logging subsystem configured to automatically log monitored behaviors within the respective section and a respective section alarming subsystem configured to automatically generate alarms for alarm worthy events within the respective section, wherein said routine anomalies and said emerging and possibly not routine anomalies are not catastrophic failures, the method comprising:
- running a first section among said plural sections of the data processing system where the first section includes as its respective section alarming subsystem, a first section alarming subsystem and includes as its respective section behaviors logging subsystem, a first section behaviors logging subsystem, the first section alarming subsystem being configured to generate alarms for non-catastrophic alarm-worthy events detected within the first section, the section behaviors logging subsystem being configured to generate a log of monitored behaviors within the first section;
logically co-associating recently logged behaviors of the generated log produced by the first section behaviors logging subsystem with substantially cotemporaneous alarms generated by the first section alarming subsystem;
building an annotated log comprised of the logically co-associated logged behaviors and the substantially cotemporaneous alarms;
using the annotated log to update a corresponding anomalies versus parameters first mapping space populated by sample points representing previously identified as routine anomalies of the first section of the data processing system by adding recent, alarm-including sample point entries from the annotated log into the first mapping space as recently logged ones of alarmed sample points (ASP'"'"'s);
determining if the recently logged ASP'"'"'s map into a first region of the first mapping space occupied by older ASP'"'"'s associated with the identified as routine anomalies or if the recently logged ASP'"'"'s map into a different region of the first mapping space, where the ASP'"'"'s which map into the different region can represent newly emerging and possibly non-routine anomalies;
automatically repeating said logically co-associating step, said building step, said using step and said determining step while the first section of the data processing system continues to run; and
automatically responding to said determining that the recently logged ASP'"'"'s map into the different region and can thus represent newly emerging and possibly non-routine anomalies that could be of significance to operations of the data processing system, the automatic responding being separate from responses to known-to-be-routine anomalies and separate from responses to detected catastrophic failures.
1 Assignment
0 Petitions
Accused Products
Abstract
Operational event loggings and operational alarm productions within a running multiserver data processing system are automatically and repeatedly sampled and co-associated with one another so as to build annotated logs that can be used by post-process analytics for filling in mappings thereof into an anomalies versus parameters mapping space and for keeping track of unusual changes in the mappings or their rates where the unusual changes can be indicative of emerging new problems of significance within the system.
56 Citations
20 Claims
-
1. A machine-implemented method of separately dealing with emerging and possibly not routine anomalies of a data processing system that could be of significance to continuing operations of the data processing system, the data processing system being subdivided into a plurality of sections with each section comprising intercoupled local resources including one or more local data processing units and one or more local data storage units, wherein two or more of the plural sections each respectively includes a respective section behaviors logging subsystem configured to automatically log monitored behaviors within the respective section and a respective section alarming subsystem configured to automatically generate alarms for alarm worthy events within the respective section, wherein said routine anomalies and said emerging and possibly not routine anomalies are not catastrophic failures, the method comprising:
-
running a first section among said plural sections of the data processing system where the first section includes as its respective section alarming subsystem, a first section alarming subsystem and includes as its respective section behaviors logging subsystem, a first section behaviors logging subsystem, the first section alarming subsystem being configured to generate alarms for non-catastrophic alarm-worthy events detected within the first section, the section behaviors logging subsystem being configured to generate a log of monitored behaviors within the first section; logically co-associating recently logged behaviors of the generated log produced by the first section behaviors logging subsystem with substantially cotemporaneous alarms generated by the first section alarming subsystem; building an annotated log comprised of the logically co-associated logged behaviors and the substantially cotemporaneous alarms; using the annotated log to update a corresponding anomalies versus parameters first mapping space populated by sample points representing previously identified as routine anomalies of the first section of the data processing system by adding recent, alarm-including sample point entries from the annotated log into the first mapping space as recently logged ones of alarmed sample points (ASP'"'"'s); determining if the recently logged ASP'"'"'s map into a first region of the first mapping space occupied by older ASP'"'"'s associated with the identified as routine anomalies or if the recently logged ASP'"'"'s map into a different region of the first mapping space, where the ASP'"'"'s which map into the different region can represent newly emerging and possibly non-routine anomalies; automatically repeating said logically co-associating step, said building step, said using step and said determining step while the first section of the data processing system continues to run; and automatically responding to said determining that the recently logged ASP'"'"'s map into the different region and can thus represent newly emerging and possibly non-routine anomalies that could be of significance to operations of the data processing system, the automatic responding being separate from responses to known-to-be-routine anomalies and separate from responses to detected catastrophic failures. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A data processing system configured to separately deal with emerging and possibly not routine anomalies of the data processing system that could be of significance to continuing operations of the data processing system, wherein said routine anomalies and said emerging and possibly not routine anomalies are not catastrophic failures, the data processing system being subdivided into a plurality of sections with each section comprising intercoupled local resources including one or more local data processing units and one or more local data storage units, wherein two or more of the plural sections each respectively includes a respective section behaviors logging subsystem configured to automatically log monitored behaviors within the respective section and generate a respective local log and each of the two or more sections respectively includes a respective section alarming subsystem configured to automatically generate alarms for alarm worthy events within the respective section, the data processing system comprising:
-
an annotated logs storing database storing one or more respective annotated logs that respectively indicate correlations for respective ones of the system sections between recently logged behaviors of the respective system sections as recently recorded in the respective local log of the respective section and temporally corresponding generatings and non-generatings of alarms by the respective section alarming subsystems of the respective system sections; an annotated logs builder, coupled to the database and configured to automatically repeatedly for respective ones of the system sections, add to the respective stored and annotated logs of the respective system sections additional samples of correlations between recently logged behaviors logged in the respective local logs and temporally corresponding generatings and non-generatings of alarms by the respective section alarming subsystems of the respective sections; and a post-process analytics portion of the data processing system that is operatively coupled to respective ones of the annotated logs stored in the database for the respective sections and is configured to automatically repeatedly map into respective anomalies versus parameters mapping spaces of respective ones of the system sections, sample point indicators indicative of respective coordinates in the respective mapping space corresponding to plural parameters associated with each generating and non-generating of alarms by the respective section alarming subsystem of the respective one among the sections and corresponding to substantially cotemporaneous, recently logged behaviors of the respective local log produced by the section behaviors logging subsystem of that respective section; wherein the post-process analytics portion is configured to flag out abnormal changes over time in the automatically repeatedly made mappings of the sample point indicators into the respective anomalies versus parameters mapping spaces, where the flagged out abnormal changes include those representing emerging and possibly not routine anomalies that are not catastrophic failures. - View Dependent Claims (16, 17)
-
-
18. A machine-implemented method for adaptively and separately responding to emerging and potentially non-routine anomalies of a running data processing system that also experiences routine anomalies, wherein said routine anomalies and said emerging and possibly not routine anomalies are not catastrophic failures, the method comprising:
-
running each of plural and inter-coupled sections of the data processing system where each respective section includes a respective section alarming subsystem and a respective section behaviors logging subsystem, the respective section alarming subsystem being configured to generate alarms for non-catastrophic alarm-worthy events within the respective section, the respective section behaviors logging subsystem being configured to generate a log of monitored behaviors within the respective section; for each respective one of the running sections which has not experienced a catastrophic failure and thus is running, respectively co-associating recently logged behaviors of the respectively generated log produced by the respective section behaviors logging subsystem with substantially contemporaneous alarms generated by the respective section alarming subsystem; for each respective one of the running sections, building a respective annotated log that includes the co-associations made between the recently logged respective behaviors and the respective temporally corresponding generatings and non-generatings of alarms by the respective section alarming subsystem; for each respective one of the running sections, using the respective annotated log to update a respective anomalies versus parameters mapping space by adding to that respective mapping space recent, alarm-including sample point entries from the respective annotated log as recently logged ones of alarmed sample points (ASP'"'"'s) of the respective section of the data processing system; for each respective one of the running sections, determining if the recently logged ASP'"'"'s map into a region of the respective mapping space occupied by older ASP'"'"'s associated with identified as routine anomalies or if the recently logged ASP'"'"'s map into a different region of the respective mapping space, where the ASP'"'"'s which map into the different region can represent newly emerging and possibly non-routine anomalies; and automatically repeating for each respective one of the running sections, said respective co-associating step, said respective building step, said respective using step and said respective determining step; keeping track in each of the respective anomalies versus parameters mapping spaces of changes over time in the mapping locations of and/or the rates of sample point additions by the respective alarm-including sample point entries to the respective anomalies versus parameters mapping spaces; and adaptively reallocating resources to respective ones of the running sections based on the tracked changes within the respective anomalies versus parameters mapping spaces. - View Dependent Claims (19, 20)
-
Specification