Collecting and aggregating log data with fault tolerance
First Claim
1. A method for collecting and aggregating datasets with fault tolerance, the method comprising:
- collecting a dataset from a data source on a machine where the dataset is generated, wherein the dataset is collected by an agent node executed on the machine;
generating a batch comprising messages from the dataset;
assigning a tag to the batch and computing a checksum for the batch;
writing the tag, the batch comprising the messages, and the checksum to an entry in a write-ahead-log (WAL) in a storage;
sending the batch comprising the messages to a receiving location;
verifying the checksum of the batch comprising the messages at the receiving location; and
in response to determining that receiving location has failed, storing, by the agent node, the dataset in a persistent storage of the machine until the receiving location is repaired or until another destination is identified.
5 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods of collecting and aggregating log data with fault tolerance are disclosed. One embodiment includes, one or more devices that generate log data, the one or more machines each associated with an agent node to collect the log data, wherein, the agent node generates a batch comprising multiple messages from the log data and assigns a tag to the hatch. In one embodiment, the agent node further computes a checksum for the batch of multiple messages. The system may further include a collector device, the collector device being associated with a collector tier having a collector node to which the agent sends the log data; wherein, the collector determines the checksum for the hatch of multiple messages received from the agent node.
-
Citations
22 Claims
-
1. A method for collecting and aggregating datasets with fault tolerance, the method comprising:
-
collecting a dataset from a data source on a machine where the dataset is generated, wherein the dataset is collected by an agent node executed on the machine; generating a batch comprising messages from the dataset; assigning a tag to the batch and computing a checksum for the batch; writing the tag, the batch comprising the messages, and the checksum to an entry in a write-ahead-log (WAL) in a storage; sending the batch comprising the messages to a receiving location; verifying the checksum of the batch comprising the messages at the receiving location; and in response to determining that receiving location has failed, storing, by the agent node, the dataset in a persistent storage of the machine until the receiving location is repaired or until another destination is identified. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. An apparatus for collecting and aggregating datasets for storage in a file system with fault tolerance, the apparatus including a memory storing instructions that, when executed by a processor of the apparatus, cause the apparatus to perform a method comprising:
-
collecting a dataset from a data source on a machine where the dataset is generated, wherein the dataset is collected by an agent node executed on the machine; generating a batch comprising messages from the dataset; assigning a tag to the batch and computing a checksum for the batch; writing the tag, the batch comprising the messages, and the checksum to an entry in a write-ahead-log (WAL) in a storage; sending the batch comprising the messages to a receiving location; verifying the checksum of the batch comprising the messages at the receiving location; and in response to determining that receiving location has failed, storing, by the agent node, the dataset in a persistent storage of the machine until the receiving location is repaired or until another destination is identified. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
Specification