Automated event ID field analysis on heterogeneous logs
First Claim
1. A method performed in a network having network devices, including computers, that generate heterogeneous logs which include a plurality of event sequences, the method comprising:
- identifying, by a processor from the heterogeneous logs, pattern fields comprised of a plurality of event identifiers;
generating, by the processor, an automata model by profiling event behaviors of the plurality of event sequences, the plurality of event sequences grouped in the automata model by combinations of one or more pattern fields and one or more event identifiers from among the plurality of event identifiers, wherein for a given combination, the one or more event identifiers therein must be respectively comprised in a same one of the one or more pattern fields with which it is combined;
detecting, by the processor, an anomaly in one of the plurality of event sequences using the automata model; and
controlling, by the processor, an anomaly-initiating one of the network devices based on the anomalywherein the identifying pattern fields comprises;
performing a tokenization process on the heterogeneous logs to generate tokens;
performing a log similarity process on the heterogeneous logs based on the tokens to identify log similarities amongst the heterogeneous logs; and
clustering the heterogeneous logs based on the log similarities to obtain clustered heterogeneous logs; and
wherein the identifying pattern fields further comprises;
aligning the clustered heterogeneous logs to preserve unknown layouts;
discovering a log motif to find a most representative layout from the unknown layouts and a plurality of log fields;
organizing the plurality of log fields into a hierarchical data structure based on the most representative layout; and
assigning field identifiers to the plurality of log fields in the hierarchical data structure based on a corresponding hierarchical layer in the hierarchical data structure.
2 Assignments
0 Petitions
Accused Products
Abstract
A system, program, and method for anomaly detection in heterogeneous logs. The system having a processor configured to identify pattern fields comprised of a plurality of event identifiers. The processor is further configured to generate an automata model by profiling event behaviors of the plurality of event sequences, the plurality of event sequences grouped in the automata model by combinations of one or more pattern fields and one or more event identifiers from among the plurality of event identifiers, wherein for a given combination, the one or more event identifiers therein must be respectively comprised in a same one of the one or more pattern fields with which it is combined. The processor is additionally configured to detect an anomaly in one of the plurality of event sequences using the automata model. The processor is also configured to control an anomaly-initiating one of the network devices based on the anomaly.
4 Citations
12 Claims
-
1. A method performed in a network having network devices, including computers, that generate heterogeneous logs which include a plurality of event sequences, the method comprising:
-
identifying, by a processor from the heterogeneous logs, pattern fields comprised of a plurality of event identifiers; generating, by the processor, an automata model by profiling event behaviors of the plurality of event sequences, the plurality of event sequences grouped in the automata model by combinations of one or more pattern fields and one or more event identifiers from among the plurality of event identifiers, wherein for a given combination, the one or more event identifiers therein must be respectively comprised in a same one of the one or more pattern fields with which it is combined; detecting, by the processor, an anomaly in one of the plurality of event sequences using the automata model; and controlling, by the processor, an anomaly-initiating one of the network devices based on the anomaly wherein the identifying pattern fields comprises; performing a tokenization process on the heterogeneous logs to generate tokens; performing a log similarity process on the heterogeneous logs based on the tokens to identify log similarities amongst the heterogeneous logs; and clustering the heterogeneous logs based on the log similarities to obtain clustered heterogeneous logs; and wherein the identifying pattern fields further comprises; aligning the clustered heterogeneous logs to preserve unknown layouts; discovering a log motif to find a most representative layout from the unknown layouts and a plurality of log fields; organizing the plurality of log fields into a hierarchical data structure based on the most representative layout; and assigning field identifiers to the plurality of log fields in the hierarchical data structure based on a corresponding hierarchical layer in the hierarchical data structure. - View Dependent Claims (2, 3, 4, 6)
-
-
5. A method performed in a network having network devices, including computers, that generate heterogeneous logs which include a plurality of event sequences, the method comprising:
-
identifying, by a processor from the heterogeneous logs, pattern fields comprised of a plurality of event identifiers; generating, by the processor, an automata model by profiling event behaviors of the plurality of event sequences, the plurality of event sequences grouped in the automata model by combinations of one or more pattern fields and one or more event identifiers from among the plurality of event identifiers, wherein for a given combination, the one or more event identifiers therein must be respectively comprised in a same one of the one or more pattern fields with which it is combined; detecting, by the processor, an anomaly in one of the plurality of event sequences using the automata model; and controlling, by the processor, an anomaly-initiating one of the network devices based on the anomaly, wherein the detecting the anomaly comprises initializing a hash table for active event automata instances, performing a log grouping based on an identifier content process, and performing an event automata matching process.
-
-
7. A computer program product for automata model formation for a network having a plurality of network devices that generate heterogeneous logs which include a plurality of event sequences, the computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to perform a method comprising:
-
identifying, by a processor from the heterogeneous logs, pattern fields comprised of a plurality of event identifiers; generating, by the processor, an automata model by profiling event behaviors of the plurality of event sequences, the plurality of event sequences in the automata model grouped by combinations of one or more pattern fields and one or more event identifiers from among the plurality of event identifiers, wherein for a given combination, the one or more event identifiers therein must be respectively comprised in a same one of the one or more pattern fields with which it is combined; detecting, by the processor, an anomaly in one of the plurality of event sequences using the automata model; and controlling, by the processor, an anomaly-initiating one of the network devices based on the anomaly; wherein the detecting the anomaly comprises initializing a hash table for active event automata instances, performing a log grouping based on an identifier content process, and performing an event automata matching process. - View Dependent Claims (8, 9, 10, 11, 12)
-
Specification