BACKGROUND FORMAT OPTIMIZATION FOR ENHANCED SQL-LIKE QUERIES IN HADOOP
First Claim
1. A system for performing queries on stored data in a distributed computing cluster of a plurality of data nodes, comprising:
- a query engine for each data node, having;
a query planner that parses a query from a client to create query fragments based on a schema specifying one or more formats in which data is stored on the data nodes,wherein, when data in a target format is stored, the query fragments are created for the target format, and when data in the target format is not stored, the query fragments are created for another format;
a query coordinator that distributes the query fragments among the plurality of data nodes; and
a query execution engine comprising;
a transformation module that transforms the data in the format for which the query fragments are created based on the schema; and
an execution module that executes the query fragments on the transformed data to obtain intermediate results that are aggregated and returned to the client.
5 Assignments
0 Petitions
Accused Products
Abstract
A format conversion engine for Apache Hadoop that converts data from its original format to a database-like format at certain time points for use by a low latency (LL) query engine. The format conversion engine comprises a daemon that is installed on each data node in a Hadoop cluster. The daemon comprises a scheduler and a converter. The scheduler determines when to perform the format conversion and notifies the converter when the time comes. The converter converts data on the data node from its original format to a database-like format for use by the low latency (LL) query engine.
44 Citations
19 Claims
-
1. A system for performing queries on stored data in a distributed computing cluster of a plurality of data nodes, comprising:
a query engine for each data node, having; a query planner that parses a query from a client to create query fragments based on a schema specifying one or more formats in which data is stored on the data nodes, wherein, when data in a target format is stored, the query fragments are created for the target format, and when data in the target format is not stored, the query fragments are created for another format; a query coordinator that distributes the query fragments among the plurality of data nodes; and a query execution engine comprising; a transformation module that transforms the data in the format for which the query fragments are created based on the schema; and an execution module that executes the query fragments on the transformed data to obtain intermediate results that are aggregated and returned to the client. - View Dependent Claims (2, 3, 4)
-
5. A method of data processing for query execution, comprising the steps of:
-
storing initial data in an original format; converting the initial data to be in a target format that is optimized for relational database processing according to a predetermined schedule; and storing the converted data together with the initial data. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12)
-
-
13. A system for data processing for query execution, comprising:
-
a first storing unit which stores initial data in an original format; a converting unit which converts the initial data to be in a target format that is optimized for relational database processing according to a predetermined schedule; and a second storing unit which stores the converted data. - View Dependent Claims (14, 15, 16)
-
-
17. A machine-readable storage medium having stored thereon instructions which when executed by one or more processors perform a method, the method comprising the steps of:
-
storing initial data in an original format; converting the initial data to be in a target format that is optimized for relational database processing according to a predetermined schedule; and storing the converted data. - View Dependent Claims (18, 19)
-
Specification