Automatic log record segmentation
First Claim
Patent Images
1. A method of segmenting log files, said method comprising:
- utilizing at least one processor to execute computer code configured to perform the steps of;
receiving a log file;
generating a plurality of tokens from the log file, wherein a token comprises a term contained within the log file;
generalizing the plurality of tokens, wherein the generalizing comprises replacing a token with a generic annotation corresponding to the token;
identifying a plurality of patterns, wherein each of the plurality of patterns contain a predetermined sequence of generalized tokens;
discerning, using a sequential pattern miner, at least one sequential pattern candidate, wherein each of the at least one sequential pattern candidates comprises a candidate for a boundary pattern;
determining at least one match to the sequential pattern candidate within the log file; and
identifying, based upon the at least one match, a boundary candidate within the log file.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods and arrangements for segmenting log records. A log file is received. Candidates for a sequential pattern within the log file are automatically discerned, and, for each candidate, a likelihood is estimated that it represents a boundary within the log file. Other variants and embodiments are broadly contemplated herein.
-
Citations
18 Claims
-
1. A method of segmenting log files, said method comprising:
-
utilizing at least one processor to execute computer code configured to perform the steps of; receiving a log file; generating a plurality of tokens from the log file, wherein a token comprises a term contained within the log file; generalizing the plurality of tokens, wherein the generalizing comprises replacing a token with a generic annotation corresponding to the token; identifying a plurality of patterns, wherein each of the plurality of patterns contain a predetermined sequence of generalized tokens; discerning, using a sequential pattern miner, at least one sequential pattern candidate, wherein each of the at least one sequential pattern candidates comprises a candidate for a boundary pattern; determining at least one match to the sequential pattern candidate within the log file; and identifying, based upon the at least one match, a boundary candidate within the log file. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. An apparatus for segmenting log records, said apparatus comprising:
-
at least one processor; and a computer readable storage medium having computer readable program code embodied therewith and executable by the at least one processor, the computer readable program code comprising; computer readable program code configured to receive a log record; computer readable code configured to generate a plurality of tokens from the log file, wherein a token comprises a term contained within the log file; computer readable code configured to generalize the plurality of tokens, wherein the generalizing comprises replacing a token with a generic annotation corresponding to the token; computer readable code configured to identify a plurality of patterns, wherein each of the plurality of patterns contain a predetermined sequence of generalized tokens; computer readable program code configured to discern, using a sequential pattern miner, at least one sequential pattern candidate, wherein each of the at least one sequential pattern candidates comprises a candidate for a boundary pattern; computer readable program code configured to determine at least one match to the sequential pattern candidate within the log file; and computer readable program code configured to identify, based upon the at least one match, a boundary candidate within the log file.
-
-
11. A computer program product for segmenting log records, said computer program product comprising:
-
a computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising; computer readable program code configured to receive a log record; computer readable code configured to generate a plurality of tokens from the log file, wherein a token comprises a term contained within the log file; computer readable code configured to generalize the plurality of tokens, wherein the generalizing comprises replacing a token with a generic annotation corresponding to the token; computer readable code configured to identify a plurality of patterns, wherein each of the plurality of patterns contain a predetermined sequence of generalized tokens; computer readable program code configured to discern, using a sequential pattern miner, at least one sequential pattern candidate, wherein each of the at least one sequential pattern candidates comprises a candidate for a boundary pattern; computer readable program code configured to determine at least one match to the sequential pattern candidate within the log file; and computer readable program code configured to identify, based upon the at least one match, a boundary candidate within the log file. - View Dependent Claims (12, 13, 14, 15, 16, 17)
-
-
18. A method comprising:
-
utilizing at least one processor to execute computer code configured to perform the steps of; receiving a log file comprising a plurality of records; tokenizing at least a portion of the log file to produce a plurality of tokens from the log file, wherein a token comprises a term contained within the log file; generating a sequence from the tokens; automatically discerning, using a sequential pattern miner, candidates for a sequential pattern within the log file by matching the generated sequence from the tokens within the log file; and for each candidate, determining a likelihood that the candidate represents a boundary between log records within the log file, wherein the boundary is an ordered occurrence of the sequential pattern.
-
Specification