Assigning blocks of a file of a distributed file system to processing units of a parallel database management system
First Claim
Patent Images
1. A method for use in a storage system, comprising:
- providing a distributed file system across a plurality of computer nodes of the storage system, wherein the distributed file system has at least one file;
providing a parallel database management system on the plurality of computer nodes of the storage system, the parallel database management system including at least one relational table that is separate from the at least one file;
in response to a query that causes access of the file and access of the relational table, determining, by one or more processors, a mapping of blocks of the file to the computer nodes using an algorithm that avoids or reduces sending of blocks of the file across the computer nodes;
using, by the one or more processors, the mapping to assign the blocks of the file to corresponding processing units of the parallel database management system;
loading the blocks to the processing units according to the assigning; and
using, by the processing units, the loaded blocks and data accessed from the relational table to produce a result for the query.
1 Assignment
0 Petitions
Accused Products
Abstract
In general, a technique or mechanism is provided to efficiently transfer data of a distributed file system to a parallel database management system using an algorithm that avoids or reduces sending of blocks of files across computer nodes on which the parallel database management system is implemented.
43 Citations
21 Claims
-
1. A method for use in a storage system, comprising:
-
providing a distributed file system across a plurality of computer nodes of the storage system, wherein the distributed file system has at least one file; providing a parallel database management system on the plurality of computer nodes of the storage system, the parallel database management system including at least one relational table that is separate from the at least one file; in response to a query that causes access of the file and access of the relational table, determining, by one or more processors, a mapping of blocks of the file to the computer nodes using an algorithm that avoids or reduces sending of blocks of the file across the computer nodes; using, by the one or more processors, the mapping to assign the blocks of the file to corresponding processing units of the parallel database management system; loading the blocks to the processing units according to the assigning; and using, by the processing units, the loaded blocks and data accessed from the relational table to produce a result for the query. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 18, 19, 20)
-
-
13. A storage system comprising:
-
a distributed file system; a parallel database management system; a plurality of computer nodes configured to implement the distributed file system and the parallel database management system, wherein the distributed file system is configured to store files, and wherein the parallel database management system is configured to store relational tables that are separate from the files, wherein the parallel database management system includes processing units distributed across the computer nodes such that each of the computer nodes includes at least one of the processing units, wherein in response to a query received by the parallel database management system that causes access of at least one file of the distributed file system and at least one relational table of the parallel database management system, the parallel database management system is configured to; determine a mapping between blocks of the at least one file and corresponding computer nodes using an algorithm that avoids or reduces sending of blocks of the at least one file across the computer nodes; use the mapping to assign the blocks of the at least one file to the corresponding processing units; according to the assigning, load the blocks of the at least one file to the processing units; and use, by the processing units, the loaded blocks and data accessed from the at least one relational table to produce a result for the query. - View Dependent Claims (14, 15, 16, 21)
-
-
17. An article comprising at least one storage medium storing instructions that upon execution by one or more processors cause a storage system to:
-
receive a query at a parallel database management system implemented across a plurality of computer nodes of the storage system, wherein the query causes access of a file of a distributed file system that is implemented across the plurality of computer nodes, the parallel database management system including at least one relational table that is separate from the file; in response to the query, determine a mapping of blocks of the file to the computer nodes using an algorithm that avoids or reduces sending of blocks of the file across the computer nodes, wherein determining the mapping is based on solving a maximum flow network problem that identifies a maximum flow or an approximate maximum flow in a flow network having graph nodes representing the blocks of the file and the computer nodes; use the mapping to assign the blocks of the file to corresponding processing units of the parallel database management system; load the blocks to the processing units according to the assigning; and use, by the processing units, the loaded blocks and data accessed from the relational table to produce a result for the query.
-
Specification