Query execution systems and methods

US 8,935,232 B2
Filed: 02/22/2011
Issued: 01/13/2015
Est. Priority Date: 06/04/2010
Status: Active Grant

First Claim

Patent Images

1. A method for processing data in a database system containing a plurality of nodes, the method comprising the steps of:

receiving a query for processing of data, wherein the data is stored in a first table in a plurality of tables, wherein the first table is stored on at least one node within the database system;

determining an attribute of the first table and second table in the plurality of tables, the attribute including a join key, the first table having a smaller size than the second table, wherein the second table is partitioned into a plurality partitions, and at least one partition of the second table is stored on at least one node within the database system;

providing the first table to each node in the database system storing at least one partition of the second table; and

joining, on each node storing at least one partition of the second table, the first table and at least one partition of the second table using the determined attribute;

whereina map phase in a plurality of map phases of a MapReduce process includes the determining, the providing, and the joining, the MapReduce process including a plurality of map phases and reduce phases, anda reduce phase in the plurality of reduce phases includes aggregating data generated as a result of the joining and, optionally, performing at least one operation on the generated data;

each partition of the second table is joined with the first table during a different map processing task in the plurality of map processing tasks of a MapReduce process.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

System, method, and computer program product for processing data are disclosed. The method includes receiving a query for processing of data, wherein the data is stored in a table in a plurality of tables, wherein the table is stored on at least one node within the database system, determining an attribute of the table and another table in the plurality of tables, partitioning one of the table and the another table in the plurality of tables using the determined attribute into a plurality of partitions, and performing a join of at least two partitions of the table and the another table using the determined attribute. The join is performed on a single node in the database system.

72 Citations

View as Search Results

34 Claims

1. A method for processing data in a database system containing a plurality of nodes, the method comprising the steps of:
- receiving a query for processing of data, wherein the data is stored in a first table in a plurality of tables, wherein the first table is stored on at least one node within the database system;
  
  determining an attribute of the first table and second table in the plurality of tables, the attribute including a join key, the first table having a smaller size than the second table, wherein the second table is partitioned into a plurality partitions, and at least one partition of the second table is stored on at least one node within the database system;
  
  providing the first table to each node in the database system storing at least one partition of the second table; and
  
  joining, on each node storing at least one partition of the second table, the first table and at least one partition of the second table using the determined attribute;
  
  whereina map phase in a plurality of map phases of a MapReduce process includes the determining, the providing, and the joining, the MapReduce process including a plurality of map phases and reduce phases, anda reduce phase in the plurality of reduce phases includes aggregating data generated as a result of the joining and, optionally, performing at least one operation on the generated data;
  
  each partition of the second table is joined with the first table during a different map processing task in the plurality of map processing tasks of a MapReduce process.
- View Dependent Claims (2, 3, 4)
- - 2. The method according to claim 1, wherein the providing further comprises storing the first table in a temporary memory location.
  - 3. The method according to claim 2, wherein the joining further comprisesobtaining at least one partition of the second table;
    - accessing the temporary memory location to obtain the first table; and
      
      joining the accessed first table and the obtained partition of the second table using the determined attribute.
  - 4. The method according to claim 1, wherein the first table is a dictionary table.

5. A system for processing data in a database system containing a plurality of nodes, comprising:
- at least one programmable processor; and
  
  at least one memory including code that, when executed by at least one programmable processor, cause the at least one programmable processor to perform operations comprising;
  
  receiving a query for processing of data, wherein the data is stored in a first table in a plurality of tables, wherein the first table is stored on at least one node within the database system;
  
  determining an attribute of the first table and second table in the plurality of tables, the attribute including a join key, the first table having a smaller size than the second table, wherein the second table is partitioned into a plurality partitions, and at least one partition of the second table is stored on at least one node within the database system;
  
  providing the first table to each node in the database system storing at least one partition of the second table; and
  
  joining, on each node storing at least one partition of the second table, the first table and at least one partition of the second table using the determined attribute;
  
  whereina map phase in a plurality of map phases of a MapReduce process includes the determining, the providing, and the joining, the MapReduce process including a plurality of map phases and reduce phases, anda reduce phase in the plurality of reduce phases includes aggregating data generated as a result of the joining and, optionally, performing at least one operation on the generated data;
  
  each partition of the second table is joined with the first table during a different map processing task in the plurality of map processing tasks of a MapReduce process.
- View Dependent Claims (6, 7, 8)
- - 6. The system according to claim 5, wherein the providing further comprises storing the first table in a temporary memory location.
  - 7. The system according to claim 6, wherein the joining further comprisesobtaining at least one partition of the second table;
    - accessing the temporary memory location to obtain the first table; and
      
      joining the accessed first table and the obtained partition of the second table using the determined attribute.
  - 8. The system according to claim 5, wherein the first table is a dictionary table.

9. A non-transitory computer-readable medium including code that, when executed by at least one programmable processor, cause the at least one programmable processor to perform operations comprising:
- receiving a query for processing of data, wherein the data is stored in a first table in a plurality of tables, wherein the first table is stored on at least one node within the database system;
  
  determining an attribute of the first table and second table in the plurality of tables, the attribute including a join key, the first table having a smaller size than the second table, wherein the second table is partitioned into a plurality partitions, and at least one partition of the second table is stored on at least one node within the database system;
  
  providing the first table to each node in the database system storing at least one partition of the second table; and
  
  joining, on each node storing at least one partition of the second table, the first table and at least one partition of the second table using the determined attribute;
  
  whereina map phase in a plurality of map phases of a MapReduce process includes the determining, the providing, and the joining, the MapReduce process including a plurality of map phases and reduce phases, anda reduce phase in the plurality of reduce phases includes aggregating data generated as a result of the joining and, optionally, performing at least one operation on the generated data;
  
  each partition of the second table is joined with the first table during a different map processing task in the plurality of map processing tasks of a MapReduce process.
- View Dependent Claims (10, 11, 12)
- - 10. The non-transitory computer-readable medium according to claim 9, wherein the providing further comprises storing the first table in a temporary memory location.
  - 11. The non-transitory computer-readable medium according to claim 10, wherein the joining further comprisesobtaining at least one partition of the second table;
    - accessing the temporary memory location to obtain the first table; and
      
      joining the accessed first table and the obtained partition of the second table using the determined attribute.
  - 12. The non-transitory computer-readable medium according to claim 9, wherein the first table is a dictionary table.

13. A method for processing data in a database system containing a plurality of nodes, the method comprising the steps of:
- receiving a query for processing of data, wherein the data is stored in a table in a plurality of tables, wherein the table is stored on at least one node within the database system;
  
  determining whether a first table in the plurality of tables is partitioned by a join attribute, the join attribute including a join key, wherein a second table in the plurality of tables is not partitioned by the join attribute;
  
  partitioning the second table using the join attribute into a plurality of partitions, wherein the partitioning of the second table is performed during a first processing task in the plurality of processing tasks;
  
  providing a partition of the second table to each node in the database system storing a partition of the first table; and
  
  joining, on each node, one or more partitions of the first table and one or more corresponding partitions of the second table based on the join attribute during a second processing task in the plurality of processing tasks;
  
  whereinthe first and second processing tasks are map phases of a MapReduce process,a map phase in a plurality of map phases of a MapReduce process includes the determining, the partitioning, the providing, and the joining, the MapReduce process including a plurality of map phases and reduce phases, anda reduce phase in the plurality of reduce phases includes aggregating data generated as a result of the joining and, optionally, performing at least one operation on the generated data.

14. A system for processing data in a database system containing a plurality of nodes, comprising:
- at least one programmable processor; and
  
  at least one memory including code that, when executed by at least one programmable processor, cause the at least one programmable processor to perform operations comprising;
  
  receiving a query for processing of data, wherein the data is stored in a table in a plurality of tables, wherein the table is stored on at least one node within the database system;
  
  determining whether a first table in the plurality of tables is partitioned by a join attribute, the join attribute including a join key, wherein a second table in the plurality of tables is not partitioned by the join attribute;
  
  partitioning the second table using the join attribute into a plurality of partitions, wherein the partitioning of the second table is performed during a first processing task in the plurality of processing tasks;
  
  providing a partition of the second table to each node in the database system storing a partition of the first table; and
  
  joining, on each node, one or more partitions of the first table and one or more corresponding partitions of the second table based on the join attribute during a second processing task in the plurality of processing tasks;
  
  whereinthe first and second processing tasks are map phases of a MapReduce process,a map phase in a plurality of map phases of a MapReduce process includes the determining, the partitioning, the providing, and the joining, the MapReduce process including a plurality of map phases and reduce phases, anda reduce phase in the plurality of reduce phases includes aggregating data generated as a result of the joining and, optionally, performing at least one operation on the generated data.

15. A non-transitory computer-readable medium including code that, when executed by at least one programmable processor, cause the at least one programmable processor to perform operations comprising:
- receiving a query for processing of data, wherein the data is stored in a table in a plurality of tables, wherein the table is stored on at least one node within the database system;
  
  determining whether a first table in the plurality of tables is partitioned by a join attribute, the join attribute including a join key, wherein a second table in the plurality of tables is not partitioned by the join attribute;
  
  partitioning the second table using the join attribute into a plurality of partitions, wherein the partitioning of the second table is performed during a first processing task in the plurality of processing tasks;
  
  providing a partition of the second table to each node in the database system storing a partition of the first table; and
  
  joining, on each node, one or more partitions of the first table and one or more corresponding partitions of the second table based on the join attribute during a second processing task in the plurality of processing tasks;
  
  whereinthe first and second processing tasks are map phases of a MapReduce process,a map phase in a plurality of map phases of a MapReduce process includes the determining, the partitioning, the providing, and the joining, the MapReduce process including a plurality of map phases and reduce phases, anda reduce phase in the plurality of reduce phases includes aggregating data generated as a result of the joining and, optionally, performing at least one operation on the generated data.

16. A method for processing data in a database system containing a plurality of nodes, the method comprising the steps of:
- receiving a query for processing of data, wherein the data is stored in a plurality of tables, wherein the plurality of tables is stored on a plurality of nodes within the database system;
  
  extracting a column join attribute from a first table in the plurality of tables for joining of the first table and a second table in the plurality of tables, the column join attribute including a join key, wherein the column join attribute is contained within at least one column selected from the first table;
  
  providing the selected column of the first table to each node in the plurality of nodes storing the second table; and
  
  joining the selected column of the first table with the second table using the extracted column join attribute;
  
  whereina map phase in a plurality of map phases of a MapReduce process includes the extracting, the providing, and the joining, the MapReduce process including a plurality of map phases and reduce phases, anda reduce phase in the plurality of reduce phases includes aggregating data generated as a result of the joining and, optionally, performing at least one operation on the generated data.
- View Dependent Claims (17, 18, 19, 20, 21)
- - 17. The method according to claim 16, wherein the second table is partitioned into a plurality of partitions using a partitioning attribute, each partition of the second table is stored on at least one node in the plurality of nodes;
    - wherein the joining further comprises joining, using the extracted column join attribute, the selected column of the first table and at least one partition of the second table at each node storing the partition of the second table in the database system.
  - 18. The method according to claim 16, wherein the first table is smaller than at least another table in the plurality of tables.
  - 19. The method according to claim 16, wherein the joining further comprisesgenerating a selection predicate based on the extracted column join attribute, wherein the selection predicate is generated based on a listing of values in the first table;
    - andusing the generated selection predicate, performing the joining.
  - 20. The method according to claim 19, wherein the generated selection predicate is provided to each node in the plurality of nodes storing the second table and the joining is performed during a map phase of a MapReduce processing task using the selection predicate.
  - 21. The method according to claim 16, wherein the plurality of processing tasks are MapReduce processing tasks;
    - wherein the selecting and the providing are performed during a first MapReduce processing task and the joining is performed during a second MapReduce processing task.

22. A system for processing data in a database system containing a plurality of nodes, comprising:
- at least one programmable processor; and
  
  at least one memory including code that, when executed by at least one programmable processor, cause the at least one programmable processor to perform operations comprising;
  
  receiving a query for processing of data, wherein the data is stored in a plurality of tables, wherein the plurality of tables is stored on a plurality of nodes within the database system;
  
  extracting a column join attribute from a first table in the plurality of tables for joining of the first table and a second table in the plurality of tables, the column join attribute including a join key, wherein the column join attribute is contained within at least one column selected from the first table;
  
  providing the selected column of the first table to each node in the plurality of nodes storing the second table; and
  
  joining the selected column of the first table with the second table using the extracted column join attribute;
  
  whereina map phase in a plurality of map phases of a MapReduce process includes the extracting, the providing, and the joining, the MapReduce process including a plurality of map phases and reduce phases, anda reduce phase in the plurality of reduce phases includes aggregating data generated as a result of the joining and, optionally, performing at least one operation on the generated data.
- View Dependent Claims (23, 24, 25, 26, 27)
- - 23. The system according to claim 22, wherein the second table is partitioned into a plurality of partitions using a partitioning attribute, each partition of the second table is stored on at least one node in the plurality of nodes;
    - wherein the joining further comprises joining, using the extracted column join attribute, the selected column of the first table and at least one partition of the second table at each node storing the partition of the second table in the database system.
  - 24. The system according to claim 22, wherein the first table is smaller than at least another table in the plurality of tables.
  - 25. The system according to claim 22, wherein the joining further comprisesgenerating a selection predicate based on the extracted column join attribute, wherein the selection predicate is generated based on a listing of values in the first table;
    - andusing the generated selection predicate, performing the joining.
  - 26. The system according to claim 25, wherein the generated selection predicate is provided to each node in the plurality of nodes storing the second table and the joining is performed during a map phase of a MapReduce processing task using the selection predicate.
  - 27. The system according to claim 22, wherein the plurality of processing tasks are MapReduce processing tasks;
    - wherein the selecting and the providing are performed during a first MapReduce processing task and the joining is performed during a second MapReduce processing task.

28. A non-transitory computer-readable medium including code that, when executed by at least one programmable processor, cause the at least one programmable processor to perform operations comprising:
- receiving a query for processing of data, wherein the data is stored in a plurality of tables, wherein the plurality of tables is stored on a plurality of nodes within the database system;
  
  extracting a column join attribute from a first table in the plurality of tables for joining of the first table and a second table in the plurality of tables, the column join attribute including a join key, wherein the column join attribute is contained within at least one column selected from the first table;
  
  providing the selected column of the first table to each node in the plurality of nodes storing the second table; and
  
  joining the selected column of the first table with the second table using the extracted column join attribute;
  
  whereina map phase in a plurality of map phases of a MapReduce process includes the extracting, the providing, and the joining, the MapReduce process including a plurality of map phases and reduce phases, anda reduce phase in the plurality of reduce phases includes aggregating data generated as a result of the joining and, optionally, performing at least one operation on the generated data.
- View Dependent Claims (29, 30, 31, 32, 33)
- - 29. The non-transitory computer-readable medium according to claim 28, wherein the second table is partitioned into a plurality of partitions using a partitioning attribute, each partition of the second table is stored on at least one node in the plurality of nodes;
    - wherein the joining further comprises joining, using the extracted column join attribute, the selected column of the first table and at least one partition of the second table at each node storing the partition of the second table in the database system.
  - 30. The non-transitory computer-readable medium according to claim 28, wherein the first table is smaller than at least another table in the plurality of tables.
  - 31. The non-transitory computer-readable medium according to claim 28, wherein the joining further comprisesgenerating a selection predicate based on the extracted column join attribute, wherein the selection predicate is generated based on a listing of values in the first table;
    - andusing the generated selection predicate, performing the joining.
  - 32. The non-transitory computer-readable medium according to claim 31, wherein the generated selection predicate is provided to each node in the plurality of nodes storing the second table and the joining is performed during a map phase of a MapReduce processing task using the selection predicate.
  - 33. The non-transitory computer-readable medium according to claim 28, wherein the plurality of processing tasks are MapReduce processing tasks;
    - wherein the selecting and the providing are performed during a first MapReduce processing task and the joining is performed during a second MapReduce processing task.

34. A method for processing data in a database system containing a plurality of nodes, the method comprising the steps of:
- receiving a query for processing of data, wherein the data is stored in at least one table on at least one node in the plurality of nodes, the table is being partitioned into a plurality of partitions;
  
  generating, based on the received query, at least one processing task containing a first phase and a second phase;
  
  selecting and aggregating data, from the plurality of partitions, during the first phase of the at least one processing task;
  
  outputting, based on the selecting and the aggregating, data in response to the received query during at least one of the following;
  
  the second phase of the at least one processing task and at least another processing task, wherein the at least one processing task and the at least another processing task are MapReduce processing tasks, each having a map phase and a reduce phase, whereby the first phase is a map phase and the second phase a reduce phase;
  
  wherein the selecting and aggregation is performed during a map phase of a MapReduce process at a node in a plurality of nodes in the database system.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Yale University
Original Assignee
Yale University
Inventors
Abadi, Daniel, Bajda-Pawlikowski, Kamil
Primary Examiner(s)
Perveen, Rehana
Assistant Examiner(s)
BUI, THUY T

Application Number

US13/032,551
Publication Number

US 20110302151A1
Time in Patent Office

1,421 Days
Field of Search

707/714
US Class Current

707/714
CPC Class Codes

G06F 16/24532   of parallel queries

G06F 16/24542   Plan optimisation

G06F 16/24547   Optimisations to support sp...

G06F 16/2456   Join operations

G06F 16/2471   Distributed queries

Query execution systems and methods

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

72 Citations

34 Claims

Specification

Solutions

Use Cases

Quick Links

Query execution systems and methods

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

72 Citations

34 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links