System and method for distributed database query engines
First Claim
1. A system, comprising:
- a gateway server configured to generate a plurality of partial queries from a database query for a database containing data stored in a distributed storage cluster that has a plurality of data nodes, and to construct a query result based on a plurality of intermediate results; and
a plurality of worker nodes, the worker nodes being separate from the data nodes, wherein each worker node of the plurality of worker nodes is configured to process a respective partial query of the plurality of partial queries by scanning data related to the respective partial query and stored on at least one data node of the distributed storage cluster, and wherein each worker node of the plurality of worker nodes is further configured to generate an intermediate result of the plurality of intermediate results that is stored in a memory of that worker node, wherein at least one of the worker nodes is further configured to divide the respective partial query into subordinate partial queries based on quantity and location information of input file blocks of the query.
2 Assignments
0 Petitions
Accused Products
Abstract
Techniques for a system capable of performing low-latency database query processing are disclosed herein. The system includes a gateway server and a plurality of worker nodes. The gateway server is configured to divide a database query, for a database containing data stored in a distributed storage cluster having a plurality of data nodes, into a plurality of partial queries and construct a query result based on a plurality of intermediate results. Each worker node of the plurality of worker nodes is configured to process a respective partial query of the plurality of partial queries by scanning data related to the respective partial query that stored on at least one data node of the distributed storage cluster and generate an intermediate result of the plurality of intermediate results that is stored in a memory of that worker node.
-
Citations
18 Claims
-
1. A system, comprising:
-
a gateway server configured to generate a plurality of partial queries from a database query for a database containing data stored in a distributed storage cluster that has a plurality of data nodes, and to construct a query result based on a plurality of intermediate results; and a plurality of worker nodes, the worker nodes being separate from the data nodes, wherein each worker node of the plurality of worker nodes is configured to process a respective partial query of the plurality of partial queries by scanning data related to the respective partial query and stored on at least one data node of the distributed storage cluster, and wherein each worker node of the plurality of worker nodes is further configured to generate an intermediate result of the plurality of intermediate results that is stored in a memory of that worker node, wherein at least one of the worker nodes is further configured to divide the respective partial query into subordinate partial queries based on quantity and location information of input file blocks of the query. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A method, comprising:
-
receiving a database query from a client device, for a database containing data stored in a distributed storage cluster having a plurality of cluster nodes;
dividing the database query into a plurality of partial queries;
sending each of the partial queries to a respective worker node of a plurality of worker nodes, wherein each worker node is a service running on a memory of a cluster node of the distributed storage cluster;identifying a straggling worker node, dividing a partial query that is assigned to the straggling worker node into a plurality of subordinate partial queries based on quantity and location information of input file blocks of the query, and assigning the plurality of subordinate partial queries to some of the plurality of worker nodes; retrieving a plurality of intermediate results for the partial queries from the worker nodes, wherein each intermediate result is processed by a respective worker node of the worker nodes by scanning related data stored in a cluster node on which the perspective worker node runs; and generating a query result based on the plurality of intermediate results. - View Dependent Claims (16, 17)
-
-
18. A method, comprising:
-
receiving a database query from a client device, for a database containing data stored in a distributed storage cluster having a plurality of cluster nodes;
dividing the database query into a plurality of partial queries;
sending each of the partial queries to a respective worker node of a plurality of worker nodes, wherein each worker node is a service running on a memory of a cluster node of the distributed storage cluster;identifying a straggling worker node, dividing a partial query that is assigned to the straggling worker node into a plurality of subordinate partial queries, and assigning the plurality of subordinate partial queries to some of the plurality of worker nodes; retrieving a plurality of intermediate results for the partial queries from the worker nodes, wherein each intermediate result is processed by a respective worker node of the worker nodes by scanning related data stored in a cluster node on which the perspective worker node runs; generating a query result based on the plurality of intermediate results; caching data associated with previous database queries for the database in a cache; retrieving a real-time feed of audit logs of the database to invalidate entries in the cached data stored in the cache that have been changed by the previous database queries; and purging entries in the cached data from the cache that have not been queried for a specified time period.
-
Specification