Collecting and aggregating datasets for analysis
First Claim
Patent Images
1. A method of facilitating collection and aggregation of machine or user generated dataset for analysis, the method comprising:
- collecting, by a compute node, the dataset from a data source on a machine,wherein the dataset is received or generated on the machine;
transmitting the dataset from the data source toward a receiving location by steps including;
recording the dataset as an event using a data model;
extracting a timestamp from the dataset;
specifying, based on the timestamp, a priority of the event in a priority field included in the data model;
specifying, based on the priority, the event in the data model with an attribute in a metadata table included in the data model, wherein the attribute includes a map that directs how the event is to be streamed to a subsequent machine, wherein the metadata table included in the data model is extensible to add additional attributes to the event by the subsequent machine which is configured to further process the dataset as the event is streamed from the data source to the receiving location;
aggregating the dataset collected from the data source at the receiving location,wherein the receiving location is dynamically updated by the compute node responsive to receiving the configuration information from a master node; and
performing analytics on the dataset responsive to collecting or aggregating the dataset on the machine;
wherein the dataset aggregated at the receiving location is written to a storage location, andwherein the dataset is stored redundantly on a distributed file system at the storage location.
5 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods of facilitating collecting and aggregating datasets that are machine or user-generated for analysis are disclosed. One embodiment includes, collecting a dataset on a machine on which the dataset is received or generated, wherein, the dataset is collected from a data source on the machine, aggregating the dataset collected from the data source at a receiving location, performing analytics on the dataset upon collection or aggregation, and/or writing the dataset aggregated at the receiving location to a storage location.
-
Citations
56 Claims
-
1. A method of facilitating collection and aggregation of machine or user generated dataset for analysis, the method comprising:
-
collecting, by a compute node, the dataset from a data source on a machine, wherein the dataset is received or generated on the machine; transmitting the dataset from the data source toward a receiving location by steps including; recording the dataset as an event using a data model; extracting a timestamp from the dataset; specifying, based on the timestamp, a priority of the event in a priority field included in the data model; specifying, based on the priority, the event in the data model with an attribute in a metadata table included in the data model, wherein the attribute includes a map that directs how the event is to be streamed to a subsequent machine, wherein the metadata table included in the data model is extensible to add additional attributes to the event by the subsequent machine which is configured to further process the dataset as the event is streamed from the data source to the receiving location; aggregating the dataset collected from the data source at the receiving location, wherein the receiving location is dynamically updated by the compute node responsive to receiving the configuration information from a master node; and performing analytics on the dataset responsive to collecting or aggregating the dataset on the machine; wherein the dataset aggregated at the receiving location is written to a storage location, and wherein the dataset is stored redundantly on a distributed file system at the storage location. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41)
-
-
42. A system for collecting and aggregating datasets that are machine or user generated, the system comprising:
-
multiple machines that are simultaneously specifiable by a master, wherein the multiple machines generate log data and are each associated with an agent node that is dynamically reconfigurable by the master to collect the log data responsive to receiving configuration instructions from the master; wherein each agent node forwards the log data to a collector tier having a collector node that aggregates the log data from the multiple machines; wherein, when the log data is forwarded, the collector node processes the log data by steps including; recording the log data as an event using a data model; extracting a timestamp from the log data; specifying, based on the timestamp, a priority of the event in a priority field included in the data model; annotating, based on the priority, the event in the data model with an attribute in a metadata table included in the data model, wherein the attribute includes a map that directs how the event is to be streamed to a subsequent node, wherein the metadata table included in the data model is extensible to add additional attributes to the event by the subsequent node which is configured to further process the log data as the event is streamed from the data source to a storage system; wherein the agent node or the collector node performs analytics on the log data responsive to aggregation of the log data from the multiple machines; and the storage system coupled to the multiple machines; wherein the storage system includes a distributed file system to which the collector node in the collector tier stores the log data aggregated from the multiple machines. - View Dependent Claims (43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56)
-
Specification