SYSTEM AND METHOD FOR DISTRIBUTED DATABASE QUERY ENGINES
First Claim
1. A system comprising:
- a gateway server configured to generate a plurality of partial queries from a database query for a database containing data stored in a distributed storage cluster that has a plurality of data nodes, and to construct a query result based on a plurality of intermediate results; and
a plurality of worker nodes, wherein each worker node of the plurality of worker nodes is configured to process a respective partial query of the plurality of partial queries by scanning data related to the respective partial query and stored on at least one data node of the distributed storage cluster, and wherein each worker node of the plurality of worker nodes is further configured to generate an intermediate result of the plurality of intermediate results that is stored in a memory of that worker node.
2 Assignments
0 Petitions
Accused Products
Abstract
Techniques for a system capable of performing low-latency database query processing are disclosed herein. The system includes a gateway server and a plurality of worker nodes. The gateway server is configured to divide a database query, for a database containing data stored in a distributed storage cluster having a plurality of data nodes, into a plurality of partial queries and construct a query result based on a plurality of intermediate results. Each worker node of the plurality of worker nodes is configured to process a respective partial query of the plurality of partial queries by scanning data related to the respective partial query that stored on at least one data node of the distributed storage cluster and generate an intermediate result of the plurality of intermediate results that is stored in a memory of that worker node.
-
Citations
20 Claims
-
1. A system comprising:
-
a gateway server configured to generate a plurality of partial queries from a database query for a database containing data stored in a distributed storage cluster that has a plurality of data nodes, and to construct a query result based on a plurality of intermediate results; and a plurality of worker nodes, wherein each worker node of the plurality of worker nodes is configured to process a respective partial query of the plurality of partial queries by scanning data related to the respective partial query and stored on at least one data node of the distributed storage cluster, and wherein each worker node of the plurality of worker nodes is further configured to generate an intermediate result of the plurality of intermediate results that is stored in a memory of that worker node. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A method comprising:
-
receiving from a client device a database query for a database containing data stored in a distributed storage cluster that has a plurality of data nodes; dividing the database query into a plurality of partial queries; sending each of the partial queries to a respective worker node of a plurality of worker nodes, wherein each worker node is a service running on a data node of the distributed storage cluster; retrieving a plurality of intermediate results for the partial queries from the worker nodes, wherein each intermediate result is processed by a respective worker node of the worker nodes by scanning related data stored in a data node on which the perspective worker node runs; and generating a query result based on the plurality of intermediate results. - View Dependent Claims (16, 17, 18)
-
-
19. A method comprising:
-
receiving a database query from a client device, for a database containing data stored in a distributed storage cluster having a plurality of data nodes; dividing the database query into a plurality of partial queries; sending each of the partial queries to a respective worker node of a plurality of worker nodes, wherein each worker node is a service running on a data node of the distributed storage cluster; identifying a straggling worker node, dividing a partial query that is assigned to the straggling worker node into a plurality of subordinate partial queries, and assigning the plurality of subordinate partial queries to some of the plurality of worker nodes; retrieving a plurality of intermediate results for the partial queries from the worker nodes, wherein each intermediate result is processed by a respective worker node of the worker nodes by scanning related data stored in a data node on which the perspective worker node runs; and generating a query result based on the plurality of intermediate results. - View Dependent Claims (20)
-
Specification