Session-based processing method and system
First Claim
1. A method for processing web server logs a session at a time, comprising:
- receiving a stream of raw log file data comprising a plurality of substantially chronologically ordered web server requests from different user sessions;
storing a data subset of the raw log file data as a first plurality of web server requests from different user sessions in a sliding memory window having a size that may be programmably controlled and adjusted for processing in a memory-efficient manner;
identifying and grouping any complete user sessions within the first plurality of web server requests stored in the sliding memory by processing the first plurality of web server requests with the sliding memory window so that at any one time a fraction of the first plurality of web server requests is loaded into the sliding memory window without requiring indexing of the entire stream of raw log file data;
identifying any incomplete user sessions within the first plurality of web server requests stored in the sliding memory;
outputting log file entries from the data subset for each complete user session identified in the first plurality of web server requests; and
processing log file entries from any incomplete user sessions identified within the first plurality of web server requests for combination with log file entries from any incomplete user sessions identified in the raw log file data.
0 Assignments
0 Petitions
Accused Products
Abstract
A log file processing system sorts records from large log files and groups them by session without making a complete copy of the log files by capturing a subset of the log files in a sliding memory window and identifying all records in the window that form a complete user session. Records belonging to a complete session are output for analyzing, and the remaining records are output as raw log data for additional processing. Using a ring buffer to implement the sliding memory window, data structures are used to group records by session, to identify completed sessions, and to index into the ring buffer to retrieve records for completed sessions that are to be directly analyzed. Any records remaining in the ring buffer at the end of slide window processing may be output as raw log file data and are processed as incomplete or malformed session records. An embodiment of the log file processing system provides a significant improvement on the speed of data extraction from log files into analyzable session data.
-
Citations
15 Claims
-
1. A method for processing web server logs a session at a time, comprising:
-
receiving a stream of raw log file data comprising a plurality of substantially chronologically ordered web server requests from different user sessions; storing a data subset of the raw log file data as a first plurality of web server requests from different user sessions in a sliding memory window having a size that may be programmably controlled and adjusted for processing in a memory-efficient manner; identifying and grouping any complete user sessions within the first plurality of web server requests stored in the sliding memory by processing the first plurality of web server requests with the sliding memory window so that at any one time a fraction of the first plurality of web server requests is loaded into the sliding memory window without requiring indexing of the entire stream of raw log file data; identifying any incomplete user sessions within the first plurality of web server requests stored in the sliding memory; outputting log file entries from the data subset for each complete user session identified in the first plurality of web server requests; and processing log file entries from any incomplete user sessions identified within the first plurality of web server requests for combination with log file entries from any incomplete user sessions identified in the raw log file data. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. An article of manufacture comprising a computer readable storage medium having stored thereon executable instructions and data which, when executed by at least one processing device, cause the at least one processing device to process web server logs a session at a time by:
-
receiving a stream of raw log file data comprising a plurality of substantially chronologically ordered web server requests from different user sessions; storing a data subset of the raw log file data as a first plurality of web server requests from different user sessions in a sliding memory window having a size that may be programmably controlled and adjusted for processing in a memory-efficient manner; identifying and grouping any complete user sessions within the first plurality of web server requests by processing the first plurality of web server requests with the sliding memory window so that at any one time a fraction of the first plurality of web server requests is loaded into the sliding memory window without requiring indexing of the entire stream of raw log file data; identifying any incomplete user sessions within the first plurality of web server requests; outputting log file entries from the data subset for each complete user session identified in the first plurality of web server requests; and processing log file entries from any incomplete user sessions identified within the first plurality of web server requests for combination with log file entries from any incomplete user sessions identified in the raw log file data. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A system for processing web server logs a session at a time using a data processing system and network session data collected from one or more users, the system comprising:
-
a log file collection system for receiving a stream of raw log file data comprising a plurality of web server requests from different user sessions from a file system and storing a data subset of the raw log file data as a first plurality of web server requests from different user sessions in local memory for processing in a memory-efficient manner; and a processing engine to identify and group any complete user sessions within the first plurality of web server requests stored in local memory without needing to build an index on the file system by processing the first plurality of web server requests with a sliding window having a size that may be programmably controlled and adjusted so that at any one time a fraction of the first plurality of web server requests is loaded into the sliding window without requiring indexing of the entire stream of raw log file data, where processing the first plurality of web server requests with the sliding window comprises; reading each log file entry loaded into the sliding window to identify user session information for said log file entry; indexing each log file entry to a corresponding user session based on the identified user session information for said log file entry; and upon detecting that all log file entries for a complete user session are present in the sliding window, grouping said log file entries for the complete user session within the first plurality of web server requests.
-
Specification