Streaming and sampling in real-time log analysis
First Claim
1. A computer-implemented method of providing real-time log analysis comprising:
- hashing, by one or more monitored hosts, a value in log messages comprising log files on the one or more monitored hosts;
tagging, by the one or more monitored hosts, each of the log messages with the hashed value;
extracting, by the one or more monitored hosts, representative samples of log data from the log files, each of the representative samples of log data comprising at least a portion of a log message extracted from the log files based on the tagged hashed value;
streaming, by the one or more monitored hosts, the representative samples of log data to a plurality of log processors;
processing, by the plurality of log processors, the representative samples of log data;
determining, by the plurality of log processors, a data completeness of the representative samples of log data processed, the data completeness comprising an indication of a proportion of total log data represented by the representative samples of log data;
merging and collating, by a data accumulation computer, the representative samples of log data;
generating, by the data accumulation computer, an estimated metric value from the merged and collated representative samples of log data based on the data completeness; and
publishing, by the data accumulation computer, the estimated metric value to consumers.
1 Assignment
0 Petitions
Accused Products
Abstract
Technologies are described herein for incorporating streaming and/or sampling in real-time log analysis. Representative samples of log data are extracted from the log files on a number of monitored hosts and streamed in real-time to log processors for processing. The log processors accumulate and process the representative samples of log data, and track a data completeness value representing an indication of a proportion of total log data represented by the representative samples received. The representative samples of log data are merged and collated. Estimated metrics are calculated from the merged and collated representative samples and the data completeness, and the estimated metrics are published to consumers in near real-time.
-
Citations
31 Claims
-
1. A computer-implemented method of providing real-time log analysis comprising:
-
hashing, by one or more monitored hosts, a value in log messages comprising log files on the one or more monitored hosts; tagging, by the one or more monitored hosts, each of the log messages with the hashed value; extracting, by the one or more monitored hosts, representative samples of log data from the log files, each of the representative samples of log data comprising at least a portion of a log message extracted from the log files based on the tagged hashed value; streaming, by the one or more monitored hosts, the representative samples of log data to a plurality of log processors; processing, by the plurality of log processors, the representative samples of log data; determining, by the plurality of log processors, a data completeness of the representative samples of log data processed, the data completeness comprising an indication of a proportion of total log data represented by the representative samples of log data; merging and collating, by a data accumulation computer, the representative samples of log data; generating, by the data accumulation computer, an estimated metric value from the merged and collated representative samples of log data based on the data completeness; and publishing, by the data accumulation computer, the estimated metric value to consumers. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer-implemented method of providing real-time log analysis comprising:
-
sampling, at one or more monitored hosts, log data from log files on the one or more monitored hosts; streaming, by the one or more monitored hosts, the sampled log data to at least one log processor; processing, by the at least one log processor, the sampled log data; determining, by the at least one log processor, a data completeness of the sampled log data stored processed, the data completeness comprising an indication of a proportion of total log data represented by the sampled log data; merging and collating, by a data accumulation computer, the sampled log data; generating, by the data accumulation computer, an estimated metric value from the merged and collated sampled log data based on the data completeness; and publishing, by the data accumulation computer, the estimated metric value to consumers. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14)
-
-
15. A computer-readable storage medium having computer-executable instructions stored thereon that, when executed by a host computer, cause the host computer to:
-
extract representative samples of log data from log files on the host computer; and periodically transmit the representative samples of log data from the host computer to one or more log processors, wherein the one or more log processors are configured to process the representative samples of log data and to determine a data completeness of the representative samples of log data processed, and wherein the processed representative samples of log data and the data completeness are utilized to generate an estimated metric value that is published to consumers in near real-time. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22)
-
-
23. A system for incorporating streaming in real-time log analysis, the system comprising:
-
a host computer; an agent executing on the host computer and configured to tag log messages in log files on the host computer with a tag value, extract log data from the log files, the extracted log data comprising at least a portion of a log message extracted from the log files, and stream the extracted log data to a log processor; at least one server computer; the log processor executing on the at least one server computer and configured to retrieve the extracted log data from the stream, process the extracted log data by accumulating the log data over a configured interval, and determine a data completeness of the processed log data, the data completeness indicating a proportion of total log data for the configured interval retrieved by the log processor; and an accumulation task executing on the at least one server computer and configured to merge and collate the processed log data, generate an estimated metric value from the merged and collated log data based on the data completeness; and publish the estimated metric value to consumers in near real-time. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31)
-
Specification