Apparatus for and method of implementing system log message ranking via system behavior analysis
First Claim
Patent Images
1. A method of analyzing system logs, said method comprising the steps of:
- creating, on a computer in a preprocessing training phase, at least one system profile representing a type of system based on an expected frequency each message type appears in a training set of system logs derived from a plurality of computers;
matching, in an operation phase, a new input system log from a computer to be analyzed to the most similar system profile created previously based on determination of a vector representing an observed frequency each message type in said new input system log appears therein;
calculating, in said operation phase, a score for each system log message in said new input system log that is related to the probability that a corresponding message type would appear in said system profile, wherein said score represents a measure of deviation of said observed frequency from said expected frequency; and
ranking, in said operation phase, said plurality of scored system log message types in order to identify atypical deviations of observed frequency from expected frequency for system log messages, whereby higher ranked message types have higher observed frequencies in said system log as compared to expected frequencies in said system profile generated during said preprocessing training phase.
1 Assignment
0 Petitions
Accused Products
Abstract
A novel and useful method for enabling system logs to be effectively and efficiently monitored by ranking the system log messages by their estimated value to administrators and generating a log view that displays the most important messages. The ranking process uses a dataset of system logs from many computer systems to score messages. For better scoring, unsupervised clustering is used to identify sets of systems that behave similarly. The expected distribution of messages in a given system is estimated using the resulting clusters, and log messages are scored using this estimation.
-
Citations
20 Claims
-
1. A method of analyzing system logs, said method comprising the steps of:
-
creating, on a computer in a preprocessing training phase, at least one system profile representing a type of system based on an expected frequency each message type appears in a training set of system logs derived from a plurality of computers; matching, in an operation phase, a new input system log from a computer to be analyzed to the most similar system profile created previously based on determination of a vector representing an observed frequency each message type in said new input system log appears therein; calculating, in said operation phase, a score for each system log message in said new input system log that is related to the probability that a corresponding message type would appear in said system profile, wherein said score represents a measure of deviation of said observed frequency from said expected frequency; and ranking, in said operation phase, said plurality of scored system log message types in order to identify atypical deviations of observed frequency from expected frequency for system log messages, whereby higher ranked message types have higher observed frequencies in said system log as compared to expected frequencies in said system profile generated during said preprocessing training phase. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method of defining one or more system profiles for use in the analysis of new input system logs, said method comprising the steps of:
-
collecting a plurality of training system logs derived from a plurality of computers; preprocessing during a training phase, on a computer, messages from said plurality of training system logs into a canonical form; creating a vector for each training system log representing a frequency that each said preprocessed message type appears therein; clustering said vectors into said one or more system profiles; calculating an average vector for said one or more system profiles representing an expected frequency that each said preprocessed message type appears in said one or more system profiles; and wherein said one or more system profiles are used during an operation phase to score and rank system log messages in a new input system log to be analyzed, whereby said ranking is carried out utilizing a score representing for a particular message type a measure of deviation of an observed frequency from said expected frequency in accordance with a corresponding system profile. - View Dependent Claims (10)
-
-
11. A method of ranking new input system log messages, said method comprising the steps of:
-
processing during an operational phase, on a computer, said new input system log messages generated by a computer into a canonical form; creating a vector from said preprocessed new input system log messages in canonical form representing an observed frequency that each said preprocessed message type appears therein; matching said vector to a system profile created a priori during a preprocessing training phase representing an expected frequency that each message type appears in a set of training system logs; and calculating, during said operational phase, a score for each system log message type based on the probability that a corresponding message type would appear in said system profile, wherein said score represents a measure of deviation of observed frequency from expected frequency, whereby rankings of message types are determined based on said scores, higher ranked message types having higher observed frequencies in said system log as compared to expected frequencies in said system profile generated during said preprocessing training phase. - View Dependent Claims (12, 13, 14)
-
-
15. A computer program product, comprising:
-
a non-transitory computer usable machine-readable data storage medium having computer usable program code embodied therein for analyzing system log messages;
said computer program product including;computer usable program code for creating, in a preprocessing training phase, at least one system profile representing a type of system based on an expected frequency each message type appears in a training set of system logs derived from a plurality of computers; computer usable program code for matching, in an operation phase, a new input system log, generated by a computer, to be analyzed to the most similar system profile created previously based on determination of a vector representing an observed frequency each message type in said new input system log appears therein; computer usable program code for calculating, in an operation phase, a score for each system log message from said new input system log that is related to the probability that a corresponding message type would appear in said system profile, wherein said score represents a measure of deviation of said observed frequency from said expected frequency; and computer usable program code for ranking, in said operation phase, said scored system log message types to identify any atypical deviations of observed frequency from expected frequency for system log messages, whereby higher ranked message types have higher observed frequencies in said new input system log as compared to expected frequencies in said system profile generated during said preprocessing training phase. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification