Distributed analytics platform
First Claim
1. A method comprising:
- receiving data from a first data source at a first one of a plurality of distributed processing nodes of an analytics platform;
performing one or more analytics operations on the data at the first processing node;
updating the data at the first processing node based on results of the one or more analytics operations;
transmitting the updated data to another one of the processing nodes along a first data path between the first data source and a first data destination; and
repeating the performing, updating and transmitting for the other processing node and for one or more additional distributed processing nodes of the analytics platform along the first data path;
the analytics platform thereby performing distributed analytics processing on the data over multiple ones of the distributed processing nodes as the data moves through the first data path from the first data source to the first data destination;
wherein the method is implemented by at least one processing device comprising a processor coupled to a memory;
wherein each of at least a subset of the distributed processing nodes is a part of multiple distinct data paths and performs distinct analytics operations for different ones of the data paths such that a role of a given one of the processing nodes in the analytics platform varies based at least in part on the particular data path over which data is received for processing;
wherein the first processing node is part of the subset of the distributed processing nodes and is a part of at least two distinct data paths including the first data path between the first data source and the first data destination and at least a second data path between a second data source and a second data destination;
wherein the first processing node performs a first type of analytics operation for a first role when receiving data on the first data path;
wherein the first processing node performs a second type of analytics operation for a second role when receiving data on the second data path;
wherein the first and second roles comprise respective distinct sets of one or more analytics-related processing tasks;
wherein the first role and the second role of the first processing node are defined by a distributed service of a software-defined function of the analytics platform; and
wherein updating the data based on results of the one or more analytics operations comprises two or more of;
adding metadata to the data;
modifying existing metadata of the data; and
transforming the data into a different format.
9 Assignments
0 Petitions
Accused Products
Abstract
An apparatus comprises an analytics platform having a plurality of distributed processing nodes. Data from a data source is received at a first one of the processing nodes. One or more analytics operations are performed on the data at the first processing node, and the data is updated at the first processing node based on results of the one or more analytics operations. The updated data is transmitted to another one of the processing nodes along a data path between the data source and a data destination. The performing, updating and transmitting are repeated for the other processing node and for one or more additional distributed processing nodes of the analytics platform along the data path. The analytics platform thereby performs distributed analytics processing on the data over multiple distributed processing nodes as the data moves through the data path from the data source to the data destination.
-
Citations
19 Claims
-
1. A method comprising:
-
receiving data from a first data source at a first one of a plurality of distributed processing nodes of an analytics platform; performing one or more analytics operations on the data at the first processing node; updating the data at the first processing node based on results of the one or more analytics operations; transmitting the updated data to another one of the processing nodes along a first data path between the first data source and a first data destination; and repeating the performing, updating and transmitting for the other processing node and for one or more additional distributed processing nodes of the analytics platform along the first data path; the analytics platform thereby performing distributed analytics processing on the data over multiple ones of the distributed processing nodes as the data moves through the first data path from the first data source to the first data destination; wherein the method is implemented by at least one processing device comprising a processor coupled to a memory; wherein each of at least a subset of the distributed processing nodes is a part of multiple distinct data paths and performs distinct analytics operations for different ones of the data paths such that a role of a given one of the processing nodes in the analytics platform varies based at least in part on the particular data path over which data is received for processing; wherein the first processing node is part of the subset of the distributed processing nodes and is a part of at least two distinct data paths including the first data path between the first data source and the first data destination and at least a second data path between a second data source and a second data destination; wherein the first processing node performs a first type of analytics operation for a first role when receiving data on the first data path; wherein the first processing node performs a second type of analytics operation for a second role when receiving data on the second data path; wherein the first and second roles comprise respective distinct sets of one or more analytics-related processing tasks; wherein the first role and the second role of the first processing node are defined by a distributed service of a software-defined function of the analytics platform; and wherein updating the data based on results of the one or more analytics operations comprises two or more of; adding metadata to the data; modifying existing metadata of the data; and transforming the data into a different format. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A computer program product comprising a non-transitory processor-readable storage medium having stored therein program code of one or more software programs, wherein the program code when executed by at least one processing device causes said at least one processing device:
-
to receive data from a first data source at a first one of a plurality of distributed processing nodes of an analytics platform; to perform one or more analytics operations on the data at the first processing node; to update the data at the first processing node based on results of the one or more analytics operations; to transmit the updated data to another one of the processing nodes along a first data path between the first data source and a first data destination; and to repeat the performing, updating and transmitting for the other processing node and for one or more additional distributed processing nodes of the analytics platform along the first data path; the analytics platform thereby performing distributed analytics processing on the data over multiple ones of the distributed processing nodes as the data moves through the first data path from the first data source to the first data destination; wherein each of at least a subset of the distributed processing nodes is a part of multiple distinct data paths and performs distinct analytics operations for different ones of the data paths such that a role of a given one of the processing nodes in the analytics platform varies based at least in part on the particular data path over which data is received for processing; wherein the first processing node is part of the subset of the distributed processing nodes and is a part of at least two distinct data paths including the first data path between the first data source and the first data destination and at least a second data path between a second data source and a second data destination; wherein the first processing node performs a first type of analytics operation for a first role when receiving data on the first data path; wherein the first processing node performs a second type of analytics operation for a second role when receiving data on the second data path; wherein the first and second roles comprise respective distinct sets of one or more analytics-related processing tasks; wherein the first role and the second role of the first processing node are defined by a distributed service of a software-defined function of the analytics platform; and wherein updating the data based on results of the one or more analytics operations comprises two or more of; adding metadata to the data; modifying existing metadata of the data; and transforming the data into a different format. - View Dependent Claims (14, 15)
-
-
16. An apparatus comprising:
-
a first processing node of an analytics platform; the first processing node being one of a plurality of distributed processing nodes of the analytics platform and being configured for communication with other ones of the distributed processing nodes over one or more networks; the first processing node being further configured; to receive data from a first data source; to perform one or more analytics operations on the data; to update the data based on results of the one or more analytics operations; and to transmit the updated data to another one of the processing nodes along a first data path between the first data source and a first data destination; wherein the performing, updating and transmitting are repeated for the other processing node and for one or more additional distributed processing nodes of the analytics platform along the first data path; the analytics platform thereby performing distributed analytics processing on the data over multiple ones of the distributed processing nodes as the data moves through the first data path from the first data source to the first data destination; wherein each of at least a subset of the distributed processing nodes is a part of multiple distinct data paths and performs distinct analytics operations for different ones of the data paths such that a role of a given one of the processing nodes in the analytics platform varies based at least in part on the particular data path over which data is received for processing; wherein the first processing node is part of the subset of the distributed processing nodes and is a part of at least two distinct data paths including the first data path between the first data source and the first data destination and at least a second data path between a second data source and a second data destination; wherein the first processing node performs a first type of analytics operation for a first role when receiving data on the first data path; wherein the first processing node performs a second type of analytics operation for a second role when receiving data on the second data path; wherein the first and second roles comprise respective distinct sets of one or more analytics-related processing tasks; wherein the first role and the second role of the first processing node are defined by a distributed service of a software-defined function of the analytics platform; and wherein updating the data based on results of the one or more analytics operations comprises two or more of; adding metadata to the data; modifying existing metadata of the data; and transforming the data into a different format. - View Dependent Claims (17, 18, 19)
-
Specification