Apparatus and method for operating a distributed database with foreign tables
First Claim
1. A system, comprising:
- a coordinator node;
a plurality of worker nodes in communication with the coordinator node, wherein each worker node stores a plurality of data blocks, wherein each data block has data in a same semi-structured format, each data block is a partition of a same distributed database, each data block has a same foreign table declaration, and the foreign table declaration includes command(s) for converting the data in the semi-structured format into converted data in a tabular format; and
a query processor executed by the coordinator node to produce a distributed query plan in response to a query in a structured query language,wherein the distributed query plan is partitioned into a plurality of sub-queries, each of which corresponds to a particular data block;
wherein for each sub-query corresponding to the particular data block;
a worker node is selected, the selected worker node contains the corresponding particular data block;
the corresponding sub-query is executed by the selected corresponding worker node of the plurality of worker nodes;
the selected worker node uses a local version of the same foreign table declaration to convert data of the particular data block in the same semi-structured format into converted data in the tabular format; and
the selected worker node executes the sub-query on the converted data to generate a sub-query result;
wherein each sub-query result is merged to produce a query result.
2 Assignments
0 Petitions
Accused Products
Abstract
A system includes a coordinator node and worker nodes in communication with the coordinator node. Each worker node stores data blocks. Each data block has data in a semi-structured format and each data block has an associated foreign table declaration specifying conversion of the data in the semi-structured format into a tabular format interpretable by a query language. A query processor executed by the coordinator node produces a distributed query plan in response to a query language query. The distributed query plan includes sub-queries. The sub-queries are executed by selected worker nodes of the worker nodes. The selected worker nodes use foreign table declarations to convert data in semi-structured formats into tabular formats of a distributed database to provide tabular data in response to the query language query.
18 Citations
33 Claims
-
1. A system, comprising:
-
a coordinator node; a plurality of worker nodes in communication with the coordinator node, wherein each worker node stores a plurality of data blocks, wherein each data block has data in a same semi-structured format, each data block is a partition of a same distributed database, each data block has a same foreign table declaration, and the foreign table declaration includes command(s) for converting the data in the semi-structured format into converted data in a tabular format; and a query processor executed by the coordinator node to produce a distributed query plan in response to a query in a structured query language, wherein the distributed query plan is partitioned into a plurality of sub-queries, each of which corresponds to a particular data block; wherein for each sub-query corresponding to the particular data block; a worker node is selected, the selected worker node contains the corresponding particular data block; the corresponding sub-query is executed by the selected corresponding worker node of the plurality of worker nodes; the selected worker node uses a local version of the same foreign table declaration to convert data of the particular data block in the same semi-structured format into converted data in the tabular format; and the selected worker node executes the sub-query on the converted data to generate a sub-query result; wherein each sub-query result is merged to produce a query result. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A computer implemented method, comprising:
-
storing a plurality of data blocks on each of a plurality of networked worker nodes, wherein the data blocks have a same semi-structured format and each data block is a partition of a same distributed database; associating each data block with the same foreign table declaration specifying conversion of the same semi-structured format into converted data in a tabular format; producing a distributed query plan in response to a query in a structured query language, wherein the distributed query plan is partitioned into a plurality of sub-queries, each of which corresponds to a particular data block; wherein for each of the sub-queries corresponding to the particular data block; selecting a worker node from the plurality of networked worker nodes, the selected worker node containing the corresponding particular data block; executing the corresponding sub-query at the selected worker node, wherein executing includes using instances of the foreign table declaration to convert data of the particular data block in the semi-structured format into converted data in the tabular format and; and
the selected worker node executes the sub-query on the converted data to generate a sub-query result; andmerging each sub-query result to produce a query result. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
-
Specification