LOW LATENCY QUERY ENGINE FOR APACHE HADOOP
First Claim
1. A system for performing queries on stored data in a distributed computing cluster, comprising:
- a plurality of data nodes, each data node having;
a query planner that parses queries from clients to create query fragments;
a query coordinator that distributes the query fragments among the plurality of data nodes; and
a query execution engine that executes query fragments to obtain intermediate results that are aggregated and returned to clients.
5 Assignments
0 Petitions
Accused Products
Abstract
A low latency query engine for Apache Hadoop that provides real-time or near real-time, ad hoc query capability, while completing batch-processing of MapReduce. In one embodiment, the low latency query engine comprises a daemon that is installed on data nodes in a Hadoop cluster for handling query requests and all internal requests related to query execution. In a further embodiment, the low latency query engine comprises a daemon for providing name service and metadata distribution. The low latency query engine receives a query request via client, turns the request into collections of plan fragments and coordinates parallel and optimized execution of the plan fragments on remote daemons to generate results at a much faster speed than existing batch-oriented processing frameworks.
-
Citations
35 Claims
-
1. A system for performing queries on stored data in a distributed computing cluster, comprising:
a plurality of data nodes, each data node having; a query planner that parses queries from clients to create query fragments; a query coordinator that distributes the query fragments among the plurality of data nodes; and a query execution engine that executes query fragments to obtain intermediate results that are aggregated and returned to clients. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
14. A method of executing a query in a distributed computing cluster having multiple data nodes, comprising:
-
receiving, by a coordinating data node in the distributed computing cluster, a query; and distributing, by the coordinating data node, fragments of the query to data nodes in the distributed computing cluster that have data relevant to the query. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
-
29. A system of executing a query in a distributed computing cluster, comprising:
-
means for receiving a query; and means for parsing and analyzing the query; means for creating plan fragments of the query; and means for distributing the plan fragments of the query to data nodes in the distributed computing cluster that have data relevant to the query. - View Dependent Claims (30, 31, 32, 33, 34, 35)
-
Specification