Self-described query execution in a massively parallel SQL execution engine
First Claim
1. A method of query execution in a massively parallel processing (MPP) data storage system comprising a master node and a cluster of multiple distributed segments that access data in distributed storage, comprising:
- generating a self-described query plan at the master node that is responsive to a query, wherein the self-described query plan comprises metadata and other information needed by one or more segments to execute the self-described query plan, wherein the metadata and other information comprise information relating to locations of data in the distributed storage in connection with execution of the self-described query plan;
communicating the self-described query plan to the one or more segments for execution; and
receiving a result associated with execution of the self-described query plan from the one or more segments, wherein the self-described query plan is executed such that the one or more segments do not access a central metadata store in connection with executing the self-described query plan.
8 Assignments
0 Petitions
Accused Products
Abstract
A query is executed in a massively parallel processing data storage system comprising a master node communicating with a cluster of multiple segments that access data in distributed storage by producing a self-described query plan at the master node that incorporates changeable metadata and information needed to execute the self-described query plan on the segments, and that incorporates references to obtain static metadata and information for functions and operators of the query plan from metadata stores on the segments. The distributed storage may be the Hadoop distributed file system, and the query plan may be a full function SQL query plan.
-
Citations
17 Claims
-
1. A method of query execution in a massively parallel processing (MPP) data storage system comprising a master node and a cluster of multiple distributed segments that access data in distributed storage, comprising:
-
generating a self-described query plan at the master node that is responsive to a query, wherein the self-described query plan comprises metadata and other information needed by one or more segments to execute the self-described query plan, wherein the metadata and other information comprise information relating to locations of data in the distributed storage in connection with execution of the self-described query plan; communicating the self-described query plan to the one or more segments for execution; and receiving a result associated with execution of the self-described query plan from the one or more segments, wherein the self-described query plan is executed such that the one or more segments do not access a central metadata store in connection with executing the self-described query plan. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A system, comprising:
-
one or more processors configured to; generate a self-described query plan at a master node that is responsive to a query, wherein the self-described query plan comprises metadata and other information needed by one or more segments to execute the self-described query plan, wherein the metadata and other information comprise information relating to locations of data in a distributed storage in connection with execution of the self-described query plan; communicate the self-described query plan to the one or more segments for execution; and receive a result associated with execution of the self-described query plan from the one or more segments, wherein the self-described query plan is executed such that the one or more segments do not access a central metadata store in connection with executing the self-described query plan; and a memory coupled to the processor and configured to provide instructions to the one or more processors.
-
-
17. A non-transitory computer readable storage media for storing executable instructions for controlling an operation of one or more computers in a massively parallel processing (MPP) data storage system comprising a master node and a cluster of multiple distributed segments that access data in distributed storage to perform a method of query execution comprising:
-
generating a self-described query plan at the master node that is responsive to a query, wherein the self-described query plan comprises metadata and other information needed by one or more segments to execute the self-described query plan, wherein the metadata and other information comprise information relating to locations of data in the distributed storage in connection with execution of the self-described query plan; communicating the self-described query plan to the one or more segments for execution; and receiving a result associated with execution of the self-described query plan from the one or more segments, wherein the self-described query plan is executed such that the one or more segments do not access a central metadata store in connection with executing the self-described query plan.
-
Specification