DATA ANALYTICS PLATFORM OVER PARALLEL DATABASES AND DISTRIBUTED FILE SYSTEMS
First Claim
Patent Images
1. A method, comprising:
- receiving, by a master node, a data analysis request;
creating, by the master node, a plan to generate a response to the request;
selecting one or more of a plurality of distributed processing segments to process a corresponding portion of the plan;
obtaining, by the master node, metadata associated with one or more portions of the plan to be performed by the one or more corresponding processing segments of the plurality of distributed processing segments;
sending, by the master node, to each of the plurality of distributed processing segments for which a portion of the plan is to be processed, the corresponding portion of the plan to be processed by that segment and the metadata, wherein the metadata is used to locate or access a subset of data on which the segment is to perform an indicated processing;
receiving, from each of the plurality of distributed processing segments for which a is portion of the plan is assigned, a corresponding result of processing the portion of the plan; and
generating, a master response to the data analysis request based at least in part on the corresponding result of processing the portion of the plan received from each of the plurality of distributed processing segments for which a portion of the plan is assigned.
7 Assignments
0 Petitions
Accused Products
Abstract
Performing data analytics processing in the context of a large scale distributed system that includes a massively parallel processing (MPP) database and a distributed storage layer is disclosed. In various embodiments, a data analytics request is received. A plan is created to generate a response to the request. A corresponding portion of the plan is assigned to each of a plurality of distributed processing segments, including by invoking as indicated in the assignment one or more data analytical functions embedded in the processing segment.
-
Citations
20 Claims
-
1. A method, comprising:
-
receiving, by a master node, a data analysis request; creating, by the master node, a plan to generate a response to the request; selecting one or more of a plurality of distributed processing segments to process a corresponding portion of the plan; obtaining, by the master node, metadata associated with one or more portions of the plan to be performed by the one or more corresponding processing segments of the plurality of distributed processing segments; sending, by the master node, to each of the plurality of distributed processing segments for which a portion of the plan is to be processed, the corresponding portion of the plan to be processed by that segment and the metadata, wherein the metadata is used to locate or access a subset of data on which the segment is to perform an indicated processing; receiving, from each of the plurality of distributed processing segments for which a is portion of the plan is assigned, a corresponding result of processing the portion of the plan; and generating, a master response to the data analysis request based at least in part on the corresponding result of processing the portion of the plan received from each of the plurality of distributed processing segments for which a portion of the plan is assigned. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. The method of claim 27, wherein the metadata sent to each of the plurality of distributed processing segments is sent as part of the corresponding portion of the plan to be performed by that segment.
-
19. A system, comprising:
-
a communication interface; and one or more processors coupled to the communication interface and configured to; receive a data analysis request; create a plan to generate a response to the request; selecting one or more of a plurality of distributed processing segments to process a corresponding portion of the plan; obtain metadata associated with one or more portions of the plan to be performed by the one or more corresponding processing segments of the plurality of distributed processing segments; send, to each of the plurality of distributed processing segments for which a portion of the plan is to be processed, the corresponding portion of the plan to be processed by that segment and the metadata, wherein the metadata is used to locate or access a subset of data on which the segment is to perform an indicated processing; receive, from each of the plurality of distributed processing segments for which a portion of the plan is assigned, a corresponding result of processing the portion of the plan; and generate, a master response to the data analysis request based at least in part on the corresponding result of processing the portion of the plan received from each of the plurality of distributed processing segments for which a portion of the plan is assigned.
-
-
20. A computer program product embodied in a tangible, non-transitory computer readable storage medium, comprising computer instructions for:
-
receiving a data analysis request; creating a plan to generate a response to the request; selecting one or more of a plurality of distributed processing segments to process a corresponding portion of the plan; obtain metadata associated with one or more portions of the plan to be performed by the one or more corresponding processing segments; sending, to each of the plurality of distributed processing segments for which a portion of the plan is to be processed, the corresponding portion of the plan to be processed by that segment and metadata used to locate or access a subset of data on which the segment is to perform an indicated processing; receiving, from each of the plurality of distributed processing segments for which a portion of the plan is assigned, a corresponding result of processing the portion of the plan; and generating, a master response to the data analysis request based at least in part on the corresponding result of processing the portion of the plan received from each of the plurality of distributed processing segments for which a portion of the plan is assigned.
-
Specification