×

Low latency query engine for Apache Hadoop

  • US 9,342,557 B2
  • Filed: 03/13/2013
  • Issued: 05/17/2016
  • Est. Priority Date: 03/13/2013
  • Status: Active Grant
First Claim
Patent Images

1. A system for performing queries on stored data in a HADOOP™

  • distributed computing cluster having a plurality of data nodes, each data node being a computing device having processing circuitry and memory circuitry, the system comprising;

    a state store that tracks a status of each data node, wherein the state store is separate from the data nodes and is further coupled to a name node that tracks where file data are stored across the cluster; and

    a plurality of data nodes forming a peer-to-peer network for the queries, each data node functioning as a peer in the peer-to-peer network and being capable of interacting with components of the HADOOP™

    cluster, each peer having an instance of a query engine running in memory, each instance of the query engine having;

    a query planner configured to;

    receive queries from clients;

    obtain, from the state store and the name node, (1) membership information regarding all query engine instances that are running in the cluster, and (2) location information regarding where data blocks relevant to the queries are distributed among the plurality of data nodes;

    parse queries from clients to create query fragments based on data obtained from the state store and the name node; and

    construct a query plan based on the data obtained from the state store;

    a query coordinator configured to distribute the query fragments among the plurality of data nodes according to the query plan; and

    a query execution engine configured to execute the query fragments, to obtain intermediate results from other data nodes that receive the query fragments, and to aggregate the intermediate results for the clients.

View all claims
  • 5 Assignments
Timeline View
Assignment View
    ×
    ×