Multi-level data staging for low latency data access
First Claim
1. A method, comprising:
- producing, at a plurality of front end servers, log data based on real-time user activities;
in an event that an aggregating server is currently unavailable;
staging the log data at a front end staging area in at least one of the plurality of front end servers for providing a back end server real-time access to the log data at the at least one of the plurality of front end servers; and
in an event that the aggregating server is currently available;
transmitting the log data from the at least one of the plurality of front end servers to the aggregating server;
aggregating the log data at the aggregating server;
staging the aggregated log data at the aggregating server for providing the back end server with access to the aggregated log data at the aggregating server;
transmitting the aggregated log data from the aggregating server to a data warehouse; and
processing the aggregated log data at the data warehouse so that the data warehouse can respond to a data query based on the processed aggregated log data.
2 Assignments
0 Petitions
Accused Products
Abstract
Techniques for facilitating and accelerating log data processing are disclosed herein. The front-end clusters generate a large amount of log data in real time and transfer the log data to an aggregating cluster. When the aggregating cluster is not available, the front-clusters write the log data to local filers and send the data when the aggregating cluster recovers. The aggregating cluster is designed to aggregate incoming log data streams from different front-end servers and clusters. The aggregating cluster further sends the aggregated log data stream to centralized NFS filers or a data warehouse cluster. The local filers and the aggregating cluster stage the log data for access by applications, so that the applications do not wait until the data reach the centralized NFS filers or data warehouse cluster.
79 Citations
16 Claims
-
1. A method, comprising:
-
producing, at a plurality of front end servers, log data based on real-time user activities; in an event that an aggregating server is currently unavailable; staging the log data at a front end staging area in at least one of the plurality of front end servers for providing a back end server real-time access to the log data at the at least one of the plurality of front end servers; and in an event that the aggregating server is currently available; transmitting the log data from the at least one of the plurality of front end servers to the aggregating server; aggregating the log data at the aggregating server; staging the aggregated log data at the aggregating server for providing the back end server with access to the aggregated log data at the aggregating server; transmitting the aggregated log data from the aggregating server to a data warehouse; and processing the aggregated log data at the data warehouse so that the data warehouse can respond to a data query based on the processed aggregated log data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-implemented system, comprising:
-
a plurality of front end servers configured to produce log data based on real-time user activities; at least one aggregating server configured to aggregate the log data received from at least some of the front end servers, the aggregating server being connected with at least some of the front end servers via a network, wherein the aggregating server includes a data staging area configured to stage the aggregated log data at the aggregating server and providing a back end server with real-time access to the aggregated log data from the data staging area; wherein the aggregating server is further configured to periodically send the aggregated log data to a data warehouse after providing the back end server real-time access to the aggregated log data, and the data warehouse is configured to process the aggregated log data and to respond to a data query based on the processed aggregated log data; and at least one second level aggregating server configured for further aggregating the aggregated log data received from the aggregating server, the second level aggregating server being connected with the aggregating server, wherein the second level aggregating server includes a second level data staging area configured for staging the further aggregated log data so that the back end server can access the further aggregated log data in real time from the second level data staging area, wherein the second level aggregating server transmits the further aggregated log data to the data warehouse. - View Dependent Claims (12, 13, 14)
-
-
15. An aggregating server, comprising:
-
a processor; a network interface, coupled to the processor, through which the aggregating server can communicate with a plurality of front end servers; a data storage including a data staging area; and a memory storing instructions which, when executed by the processor, cause the aggregating server to perform a process including; receiving log data from the front end servers, wherein the front end servers produce the log data based on real-time user activities, aggregating the log data, staging the aggregated log data at the data staging area of the aggregating server to provide at least one back end server with real-time access to the aggregated log data, splitting the aggregated log data into multiple log data streams based on hash values calculated based on entries of the aggregated log data, and feeding the multiple log data streams in parallel to the at least one back end server, after providing the at least one back end server real-time access to the aggregated log data, sending the aggregated log data from the aggregating server to a data warehouse, wherein the data warehouse processes the aggregated log data and responds to data queries based on the processed aggregated log data. - View Dependent Claims (16)
-
Specification