Session-Based Processing Method and System
First Claim
1. A method for processing web server logs a session at a time, comprising:
- receiving a stream of raw log file data that is substantially chronologically ordered;
storing a data subset of the raw log file data for processing in a memory-efficient manner;
identifying and grouping any complete sessions within the data subset;
identifying any incomplete sessions within the data subset; and
outputting log file entries from the data subset for each complete session identified in the data subset.
0 Assignments
0 Petitions
Accused Products
Abstract
A log file processing system sorts records from large log files and groups them by session without making a complete copy of the log files by capturing a subset of the log files in a sliding memory window and identifying all records in the window that form a complete user session. Records belonging to a complete session are output for analyzing, and the remaining records are output as raw log data for additional processing. Using a ring buffer to implement the sliding memory window, data structures are used to group records by session, to identify completed sessions, and to index into the ring buffer to retrieve records for completed sessions that are to be directly analyzed. Any records remaining in the ring buffer at the end of slide window processing may be output as raw log file data and are processed as incomplete or malformed session records. An embodiment of the log file processing system provides a significant improvement on the speed of data extraction from log files into analyzable session data.
16 Citations
58 Claims
-
1. A method for processing web server logs a session at a time, comprising:
-
receiving a stream of raw log file data that is substantially chronologically ordered; storing a data subset of the raw log file data for processing in a memory-efficient manner; identifying and grouping any complete sessions within the data subset; identifying any incomplete sessions within the data subset; and outputting log file entries from the data subset for each complete session identified in the data subset. - View Dependent Claims (40, 41, 42, 43, 44, 45, 46, 47)
-
-
2-20. -20. (canceled)
-
48. An article of manufacture comprising a computer readable storage medium having stored thereon executable instructions and data which, when executed by at least one processing device, cause the at least one processing device to process web server logs a session at a time by:
-
receiving a stream of raw log file data that is substantially chronologically ordered; storing a data subset of the raw log file data for processing in a memory-efficient manner; identifying and grouping any complete sessions within the data subset; identifying any incomplete sessions within the data subset; and outputting log file entries from the data subset for each complete session identified in the data subset. - View Dependent Claims (49, 50, 51, 52, 53, 54, 55, 56)
-
-
57. A system for processing web server logs a session at a time using a data processing system and network session data collected from one or more users, the system comprising:
-
a log file collection system for receiving a stream of raw log file data from a file system and storing a data subset of the raw log file data in local memory for processing in a memory-efficient manner; and a processing engine to identify and group any complete sessions within the data subset stored in local memory without needing to build an index on the file system by processing the data subset with a sliding window so that at any one time a large fraction of the data subset is loaded into the sliding window. - View Dependent Claims (58)
-
Specification