Query execution systems and methods
First Claim
1. A method for processing data in a database system containing a plurality of nodes, the method comprising the steps of:
- receiving a query for processing of data, wherein the data is stored in a first table in a plurality of tables, wherein the first table is stored on at least one node within the database system;
determining an attribute of the first table and second table in the plurality of tables, the attribute including a join key, the first table having a smaller size than the second table, wherein the second table is partitioned into a plurality partitions, and at least one partition of the second table is stored on at least one node within the database system;
providing the first table to each node in the database system storing at least one partition of the second table; and
joining, on each node storing at least one partition of the second table, the first table and at least one partition of the second table using the determined attribute;
whereina map phase in a plurality of map phases of a MapReduce process includes the determining, the providing, and the joining, the MapReduce process including a plurality of map phases and reduce phases, anda reduce phase in the plurality of reduce phases includes aggregating data generated as a result of the joining and, optionally, performing at least one operation on the generated data;
each partition of the second table is joined with the first table during a different map processing task in the plurality of map processing tasks of a MapReduce process.
1 Assignment
0 Petitions
Accused Products
Abstract
System, method, and computer program product for processing data are disclosed. The method includes receiving a query for processing of data, wherein the data is stored in a table in a plurality of tables, wherein the table is stored on at least one node within the database system, determining an attribute of the table and another table in the plurality of tables, partitioning one of the table and the another table in the plurality of tables using the determined attribute into a plurality of partitions, and performing a join of at least two partitions of the table and the another table using the determined attribute. The join is performed on a single node in the database system.
72 Citations
34 Claims
-
1. A method for processing data in a database system containing a plurality of nodes, the method comprising the steps of:
-
receiving a query for processing of data, wherein the data is stored in a first table in a plurality of tables, wherein the first table is stored on at least one node within the database system; determining an attribute of the first table and second table in the plurality of tables, the attribute including a join key, the first table having a smaller size than the second table, wherein the second table is partitioned into a plurality partitions, and at least one partition of the second table is stored on at least one node within the database system; providing the first table to each node in the database system storing at least one partition of the second table; and joining, on each node storing at least one partition of the second table, the first table and at least one partition of the second table using the determined attribute; wherein a map phase in a plurality of map phases of a MapReduce process includes the determining, the providing, and the joining, the MapReduce process including a plurality of map phases and reduce phases, and a reduce phase in the plurality of reduce phases includes aggregating data generated as a result of the joining and, optionally, performing at least one operation on the generated data; each partition of the second table is joined with the first table during a different map processing task in the plurality of map processing tasks of a MapReduce process. - View Dependent Claims (2, 3, 4)
-
-
5. A system for processing data in a database system containing a plurality of nodes, comprising:
-
at least one programmable processor; and at least one memory including code that, when executed by at least one programmable processor, cause the at least one programmable processor to perform operations comprising; receiving a query for processing of data, wherein the data is stored in a first table in a plurality of tables, wherein the first table is stored on at least one node within the database system; determining an attribute of the first table and second table in the plurality of tables, the attribute including a join key, the first table having a smaller size than the second table, wherein the second table is partitioned into a plurality partitions, and at least one partition of the second table is stored on at least one node within the database system; providing the first table to each node in the database system storing at least one partition of the second table; and joining, on each node storing at least one partition of the second table, the first table and at least one partition of the second table using the determined attribute; wherein a map phase in a plurality of map phases of a MapReduce process includes the determining, the providing, and the joining, the MapReduce process including a plurality of map phases and reduce phases, and a reduce phase in the plurality of reduce phases includes aggregating data generated as a result of the joining and, optionally, performing at least one operation on the generated data; each partition of the second table is joined with the first table during a different map processing task in the plurality of map processing tasks of a MapReduce process. - View Dependent Claims (6, 7, 8)
-
-
9. A non-transitory computer-readable medium including code that, when executed by at least one programmable processor, cause the at least one programmable processor to perform operations comprising:
-
receiving a query for processing of data, wherein the data is stored in a first table in a plurality of tables, wherein the first table is stored on at least one node within the database system; determining an attribute of the first table and second table in the plurality of tables, the attribute including a join key, the first table having a smaller size than the second table, wherein the second table is partitioned into a plurality partitions, and at least one partition of the second table is stored on at least one node within the database system; providing the first table to each node in the database system storing at least one partition of the second table; and joining, on each node storing at least one partition of the second table, the first table and at least one partition of the second table using the determined attribute; wherein a map phase in a plurality of map phases of a MapReduce process includes the determining, the providing, and the joining, the MapReduce process including a plurality of map phases and reduce phases, and a reduce phase in the plurality of reduce phases includes aggregating data generated as a result of the joining and, optionally, performing at least one operation on the generated data; each partition of the second table is joined with the first table during a different map processing task in the plurality of map processing tasks of a MapReduce process. - View Dependent Claims (10, 11, 12)
-
-
13. A method for processing data in a database system containing a plurality of nodes, the method comprising the steps of:
-
receiving a query for processing of data, wherein the data is stored in a table in a plurality of tables, wherein the table is stored on at least one node within the database system; determining whether a first table in the plurality of tables is partitioned by a join attribute, the join attribute including a join key, wherein a second table in the plurality of tables is not partitioned by the join attribute; partitioning the second table using the join attribute into a plurality of partitions, wherein the partitioning of the second table is performed during a first processing task in the plurality of processing tasks; providing a partition of the second table to each node in the database system storing a partition of the first table; and joining, on each node, one or more partitions of the first table and one or more corresponding partitions of the second table based on the join attribute during a second processing task in the plurality of processing tasks; wherein the first and second processing tasks are map phases of a MapReduce process, a map phase in a plurality of map phases of a MapReduce process includes the determining, the partitioning, the providing, and the joining, the MapReduce process including a plurality of map phases and reduce phases, and a reduce phase in the plurality of reduce phases includes aggregating data generated as a result of the joining and, optionally, performing at least one operation on the generated data.
-
-
14. A system for processing data in a database system containing a plurality of nodes, comprising:
-
at least one programmable processor; and at least one memory including code that, when executed by at least one programmable processor, cause the at least one programmable processor to perform operations comprising; receiving a query for processing of data, wherein the data is stored in a table in a plurality of tables, wherein the table is stored on at least one node within the database system; determining whether a first table in the plurality of tables is partitioned by a join attribute, the join attribute including a join key, wherein a second table in the plurality of tables is not partitioned by the join attribute; partitioning the second table using the join attribute into a plurality of partitions, wherein the partitioning of the second table is performed during a first processing task in the plurality of processing tasks; providing a partition of the second table to each node in the database system storing a partition of the first table; and joining, on each node, one or more partitions of the first table and one or more corresponding partitions of the second table based on the join attribute during a second processing task in the plurality of processing tasks; wherein the first and second processing tasks are map phases of a MapReduce process, a map phase in a plurality of map phases of a MapReduce process includes the determining, the partitioning, the providing, and the joining, the MapReduce process including a plurality of map phases and reduce phases, and a reduce phase in the plurality of reduce phases includes aggregating data generated as a result of the joining and, optionally, performing at least one operation on the generated data.
-
-
15. A non-transitory computer-readable medium including code that, when executed by at least one programmable processor, cause the at least one programmable processor to perform operations comprising:
-
receiving a query for processing of data, wherein the data is stored in a table in a plurality of tables, wherein the table is stored on at least one node within the database system; determining whether a first table in the plurality of tables is partitioned by a join attribute, the join attribute including a join key, wherein a second table in the plurality of tables is not partitioned by the join attribute; partitioning the second table using the join attribute into a plurality of partitions, wherein the partitioning of the second table is performed during a first processing task in the plurality of processing tasks; providing a partition of the second table to each node in the database system storing a partition of the first table; and joining, on each node, one or more partitions of the first table and one or more corresponding partitions of the second table based on the join attribute during a second processing task in the plurality of processing tasks; wherein the first and second processing tasks are map phases of a MapReduce process, a map phase in a plurality of map phases of a MapReduce process includes the determining, the partitioning, the providing, and the joining, the MapReduce process including a plurality of map phases and reduce phases, and a reduce phase in the plurality of reduce phases includes aggregating data generated as a result of the joining and, optionally, performing at least one operation on the generated data.
-
-
16. A method for processing data in a database system containing a plurality of nodes, the method comprising the steps of:
-
receiving a query for processing of data, wherein the data is stored in a plurality of tables, wherein the plurality of tables is stored on a plurality of nodes within the database system; extracting a column join attribute from a first table in the plurality of tables for joining of the first table and a second table in the plurality of tables, the column join attribute including a join key, wherein the column join attribute is contained within at least one column selected from the first table; providing the selected column of the first table to each node in the plurality of nodes storing the second table; and joining the selected column of the first table with the second table using the extracted column join attribute; wherein a map phase in a plurality of map phases of a MapReduce process includes the extracting, the providing, and the joining, the MapReduce process including a plurality of map phases and reduce phases, and a reduce phase in the plurality of reduce phases includes aggregating data generated as a result of the joining and, optionally, performing at least one operation on the generated data. - View Dependent Claims (17, 18, 19, 20, 21)
-
-
22. A system for processing data in a database system containing a plurality of nodes, comprising:
-
at least one programmable processor; and at least one memory including code that, when executed by at least one programmable processor, cause the at least one programmable processor to perform operations comprising; receiving a query for processing of data, wherein the data is stored in a plurality of tables, wherein the plurality of tables is stored on a plurality of nodes within the database system; extracting a column join attribute from a first table in the plurality of tables for joining of the first table and a second table in the plurality of tables, the column join attribute including a join key, wherein the column join attribute is contained within at least one column selected from the first table; providing the selected column of the first table to each node in the plurality of nodes storing the second table; and joining the selected column of the first table with the second table using the extracted column join attribute; wherein a map phase in a plurality of map phases of a MapReduce process includes the extracting, the providing, and the joining, the MapReduce process including a plurality of map phases and reduce phases, and a reduce phase in the plurality of reduce phases includes aggregating data generated as a result of the joining and, optionally, performing at least one operation on the generated data. - View Dependent Claims (23, 24, 25, 26, 27)
-
-
28. A non-transitory computer-readable medium including code that, when executed by at least one programmable processor, cause the at least one programmable processor to perform operations comprising:
-
receiving a query for processing of data, wherein the data is stored in a plurality of tables, wherein the plurality of tables is stored on a plurality of nodes within the database system; extracting a column join attribute from a first table in the plurality of tables for joining of the first table and a second table in the plurality of tables, the column join attribute including a join key, wherein the column join attribute is contained within at least one column selected from the first table; providing the selected column of the first table to each node in the plurality of nodes storing the second table; and joining the selected column of the first table with the second table using the extracted column join attribute; wherein a map phase in a plurality of map phases of a MapReduce process includes the extracting, the providing, and the joining, the MapReduce process including a plurality of map phases and reduce phases, and a reduce phase in the plurality of reduce phases includes aggregating data generated as a result of the joining and, optionally, performing at least one operation on the generated data. - View Dependent Claims (29, 30, 31, 32, 33)
-
-
34. A method for processing data in a database system containing a plurality of nodes, the method comprising the steps of:
-
receiving a query for processing of data, wherein the data is stored in at least one table on at least one node in the plurality of nodes, the table is being partitioned into a plurality of partitions; generating, based on the received query, at least one processing task containing a first phase and a second phase; selecting and aggregating data, from the plurality of partitions, during the first phase of the at least one processing task; outputting, based on the selecting and the aggregating, data in response to the received query during at least one of the following;
the second phase of the at least one processing task and at least another processing task, wherein the at least one processing task and the at least another processing task are MapReduce processing tasks, each having a map phase and a reduce phase, whereby the first phase is a map phase and the second phase a reduce phase;wherein the selecting and aggregation is performed during a map phase of a MapReduce process at a node in a plurality of nodes in the database system.
-
Specification