Configuring a system to collect and aggregate datasets
First Claim
1. A method for configuring a system to collect and aggregate datasets, wherein the system comprises agent nodes to collect the datasets, collector nodes to receive the datasets from the agent nodes, and master nodes configured to dynamically change topology among the nodes in the system, the method being executed by a master node and comprising:
- identifying a data source in the system from which a dataset is to be collected;
configuring a machine in the system that generates the dataset to send the dataset to the data source;
identifying an arrival location where the dataset is to be aggregated or written;
dynamically configuring, based on system changes, an agent node in an agent tier by;
specifying a source for the agent node as the identified data source in the system;
specifying a sink for the agent node as a collector source of a collector node in a collector tier; and
dynamically configuring, based on the system changes, a collector node in a collector tier by;
specifying the collector source of the collector node in the collector tier as the identified arrival location; and
specifying a collector sink of the collector node in the collector tier as a distributed file system;
wherein the distributed file system is in a storage tier;
wherein the agent node and collector node function as peers in a peer-to-peer network.
5 Assignments
0 Petitions
Accused Products
Abstract
Methods for configuring a system to collect and aggregate datasets are disclosed. One embodiment includes, identifying a data source in the system from where dataset is to be collected, configuring a machine in the system that generates the dataset to be collected, to send the dataset to the data source, identifying an arrival location where the dataset that is collected is to be aggregated or written, and/or configuring an agent node by specifying a source for the agent node as the data source in the system and specifying a sink for the agent node as the arrival location.
-
Citations
30 Claims
-
1. A method for configuring a system to collect and aggregate datasets, wherein the system comprises agent nodes to collect the datasets, collector nodes to receive the datasets from the agent nodes, and master nodes configured to dynamically change topology among the nodes in the system, the method being executed by a master node and comprising:
-
identifying a data source in the system from which a dataset is to be collected; configuring a machine in the system that generates the dataset to send the dataset to the data source; identifying an arrival location where the dataset is to be aggregated or written; dynamically configuring, based on system changes, an agent node in an agent tier by; specifying a source for the agent node as the identified data source in the system;
specifying a sink for the agent node as a collector source of a collector node in a collector tier; anddynamically configuring, based on the system changes, a collector node in a collector tier by; specifying the collector source of the collector node in the collector tier as the identified arrival location; and specifying a collector sink of the collector node in the collector tier as a distributed file system; wherein the distributed file system is in a storage tier; wherein the agent node and collector node function as peers in a peer-to-peer network. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A method for configuring a system having multiple machines to collect datasets from the multiple machines and to perform analytics on the datasets, wherein the system comprises agent nodes to collect the datasets, collector nodes to receive the datasets from the agent nodes, and master nodes configured to dynamically change topology among the nodes in the system, the method being executed by a master node and comprising:
-
identifying data sources on the multiple machines wherein datasets are to be collected from; configuring the multiple machines in the system that generate the datasets to send the datasets to the data sources; identifying an arrival location where dataset is to be logged; and accessing a master through a web page to dynamically specify, based on the system changes, configurations for the multiple machines simultaneously, wherein the specifying of configurations comprises; specifying sources for agent nodes in an agent tier as the identified data sources; wherein each agent node is associated with one of the multiple machines; specifying a sink for each of the agent nodes in the agent tier as a collector source of a collector node in a collector tier; specifying the collector source of the collector node in the collector tier as the identified arrival location; specifying a collector sink of the collector node as a distributed file system in a storage tier; wherein the agent nodes and collector nodes function as peers in a peer-to-peer network; and wherein at least one of the agent nodes of the agent tier or the collector node of the collector tier performs analytics on the datasets. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28)
-
-
29. A non-transitory machine-readable storage medium having stored thereon instructions which, when executed causes a method for configuring a system having multiple machines to collect datasets from the multiple machines and to perform analytics on the datasets to be performed by a master node, wherein the system comprises agent nodes to collect the datasets, collector nodes to receive the datasets from the agent nodes, and master nodes configured to dynamically change topology among the nodes in the system, the method comprising:
-
identifying data sources on the multiple machines wherein datasets are to be collected from; configuring the multiple machines in the system that generate the datasets to send the datasets to the data sources; identifying an arrival location where dataset is to be logged; and accessing a master through a web page to dynamically specify, based on the system changes, configurations for the multiple machines simultaneously, wherein the specifying of the configurations comprises; specifying sources for agent nodes in an agent tier as the identified data sources; wherein each agent node is associated with one of the multiple machines; specifying a sink for each of the agent nodes in the agent tier as a collector source of a collector node in a collector tier; specifying the collector source of the collector node in the collector tier as the identified arrival location; and specifying a collector sink of the collector node as a distributed file system in a storage tier; wherein the agent nodes and collector nodes function as peers in a peer-to-peer network; and wherein at least one of the agent nodes of the agent tier or the collector node of the collector tier performs analytics on the datasets. - View Dependent Claims (30)
-
Specification