Hash join using collaborative parallel filtering in intelligent storage with offloaded bloom filters
First Claim
1. A method comprising:
- in a database system comprising at least a database server and a data storage system that is separate from the database server, performing a join operation between at least a first table and a second table by;
at the database server, generating a bloom filter based upon one or more attributes of the second table;
identifying, at the database server, one or more locations of one or more data blocks in which the data storage system stores data for the first table;
wherein the data storage system comprises one or more storage devices upon which data for tables, including the first table, is stored in a plurality of block structures, including the one or more data blocks;
the database server sending to the data storage system;
a) an input/output (I/O) request identifying the one or more locations of the one or more data blocks stored in the data storage system; and
b) the bloom filter, associated with the I/O request;
wherein the I/O request, is a communication that, when interpreted by the data storage system, causes;
the data storage system reading said one or more data blocks from the one or more storage devices;
the data storage system generating a filtered response to the I/O request from the one or more data blocks by removing data block rows that do not apply to the bloom filter; and
sending to the data storage system metadata indicating to the data storage system one or more data block row fields, of the one or more data blocks, that correspond to one or more particular columns of the first table, wherein removing data block rows that do not apply to the bloom filter comprises comparing results of one or more functions of the values of the one or more data block row fields to the bloom filter;
in response to the I/O request, the database server receiving the filtered response from the data storage system;
performing the join operation based on the filtered response;
wherein the method is performed by one or more computing devices.
1 Assignment
0 Petitions
Accused Products
Abstract
Processing resources at a storage system for a database server are utilized to perform aspects of a join operation that would conventionally be performed by the database server. When requesting a range of data units from a storage system, the database server includes join metadata describing aspects of the join operation for which the data is being requested. The join metadata may be, for instance, a bloom filter. The storage system reads the requested data from disk as normal. However, prior to sending the requested data back to the storage system, the storage system analyzes the raw data based on the join metadata, removing a certain amount of data that is guaranteed to be irrelevant to the join operation. The storage system then returns filtered data to the database server. The database system thereby avoids the unnecessary transfer of certain data between the storage system and the database server.
-
Citations
54 Claims
-
1. A method comprising:
-
in a database system comprising at least a database server and a data storage system that is separate from the database server, performing a join operation between at least a first table and a second table by; at the database server, generating a bloom filter based upon one or more attributes of the second table; identifying, at the database server, one or more locations of one or more data blocks in which the data storage system stores data for the first table; wherein the data storage system comprises one or more storage devices upon which data for tables, including the first table, is stored in a plurality of block structures, including the one or more data blocks; the database server sending to the data storage system; a) an input/output (I/O) request identifying the one or more locations of the one or more data blocks stored in the data storage system; and b) the bloom filter, associated with the I/O request; wherein the I/O request, is a communication that, when interpreted by the data storage system, causes; the data storage system reading said one or more data blocks from the one or more storage devices; the data storage system generating a filtered response to the I/O request from the one or more data blocks by removing data block rows that do not apply to the bloom filter; and sending to the data storage system metadata indicating to the data storage system one or more data block row fields, of the one or more data blocks, that correspond to one or more particular columns of the first table, wherein removing data block rows that do not apply to the bloom filter comprises comparing results of one or more functions of the values of the one or more data block row fields to the bloom filter; in response to the I/O request, the database server receiving the filtered response from the data storage system; performing the join operation based on the filtered response; wherein the method is performed by one or more computing devices. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A method comprising:
-
receiving, at a data storage system, an input/output (I/O) request for data from a database server, the I/O request specifying locations of a plurality of data blocks at which the requested data is stored; wherein the data storage system comprises the one or more storage devices, upon which the database server stores data for table structures in a plurality of block structures, including the plurality of data blocks; receiving, at the data storage system, in association with the I/O request, a bloom filter; receiving, at the data storage system, metadata indicating to the data storage system one or more data block row fields, of the plurality of data blocks, that correspond to one or more particular columns against which the bloom filter is to be tested; responsive to the I/O request; the data storage system reading the plurality of data blocks from the one or more storage devices; the data storage system generating filtered response data from the plurality of data blocks by removing data block rows that do not apply to the bloom filter, wherein removing data block rows that do not apply to the bloom filter comprises comparing results of one or more functions of the values of the one or more data block row fields to the bloom filter; the data storage system sending the filtered response data to the database server; wherein the method is performed by one or more computing devices. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25)
-
-
26. One or more non-transitory computer-readable media storing instructions which, when executed by the one or more computing devices, cause:
-
in a database system comprising at least a database server and a data storage system that is separate from the database server, performing a join operation between at least a first table and a second table by; at the database server, generating a bloom filter based upon one or more attributes of the second table; identifying, at the database server, one or more locations of one or more data blocks in which the data storage system stores data for the first table; wherein the data storage system comprises one or more storage devices upon which data for tables, including the first table, is stored in a plurality of block structures, including the one or more data blocks; the database server sending to the data storage system; a) an input/output (I/O) request identifying the one or more locations of the one or more data blocks stored in the data storage system; and b) the bloom filter, associated with the I/O request; wherein the I/O request, is a communication that, when interpreted by the data storage system, causes; the data storage system reading said one or more data blocks from the one or more storage devices; the data storage system generating a filtered response to the I/O request from the one or more data blocks by removing data block rows that do not apply to the bloom filter; and sending to the data storage system metadata indicating to the data storage system one or more data block row fields, of the one or more data blocks, that correspond to one or more particular columns of the first table, wherein removing data block rows that do not apply to the bloom filter comprises comparing results of one or more functions of the values of the one or more data block row fields to the bloom filter; in response to the I/O request, the database server receiving the filtered response from the data storage system; performing the join operation based on the filtered response. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43)
-
-
44. One or more non-transitory storage-computer-readable media storing instructions which, when executed by one or more computing devices, cause performance of:
-
receiving, at a data storage system, an input/output (I/O) request for data from a database server, the I/O request specifying locations of a plurality of data blocks at which the requested data is stored; wherein the data storage system comprises the one or more storage devices, upon which the database server stores data for table structures in a plurality of block structures, including the data blocks; receiving, at the data storage system, in association with the I/O request, a bloom filter; receiving, at the data storage system, metadata indicating to the data storage system one or more data block row fields, of the plurality of data blocks, that correspond to one or more particular columns against which the bloom filter is to be tested; responsive to the I/O request; the data storage system reading the plurality of data blocks from the one or more storage devices; the data storage system generating filtered response data from the plurality of data blocks by removing data block rows that do not apply to the bloom filter, wherein removing data block rows that do not apply to the bloom filter comprises comparing results of one or more functions of the values of the one or more data block row fields to the bloom filter; the data storage system sending the filtered response data to the database server. - View Dependent Claims (45, 46, 47, 48, 49, 50, 51)
-
-
52. A system comprising:
-
one or more blocked storage devices storing data in a plurality of data block structures; a storage server that responds to I/O requests for data block structures by reading data blocks from or writing data blocks to the one or more blocked storage devices; and a database server, connected via a network to the storage server, that receives and responds to a given SQL query from a client by at least;
1) requesting a first set of one or more data blocks from the storage server;
2) interpreting the first set of one or more data blocks as one or more columns of at least one or more database tables; and
3) executing a database operation indicated by the given SQL query against the at least one or more columns to produce a result set;wherein, for a particular SQL query indicating a join operation between a first table and a second table, the database server is configured to generate a bloom filter based upon the second table and send to the storage server;
a) an I/O request that specifies one or more addresses from which to retrieve a second set of one or more data blocks storing data for the first table;
b) the bloom filter; and
c) metadata indicating to the data storage system one or more data block row fields of the one or more data blocks that correspond to one or more particular columns of the first table;wherein the storage server is configured to, in response to the I/O request, read the second set of one or more data blocks from the one or more storage devices at the specified one or more addresses; wherein the storage server is further configured to, upon reading the second set of one or more data blocks in response to the I/O request, produce filtered data by removing data block rows that do not apply to the bloom filter, wherein removing data block rows that do not apply to the bloom filter comprises comparing results of one or more functions of the values of the one or more data block row fields to the bloom filter; wherein the storage server is further configured to, in response to the I/O request, send the filtered data to the database server; and wherein the database server is further configured to perform the join operation based on the second table and the filtered data returned by the storage server.
-
-
53. An apparatus comprising:
-
one or more computer processors; one or more network interfaces configured to send I/O requests to at least a data storage system, and receive responses to the I/O requests; a database server component, implemented at least partially by the one or more computer processors, configured to perform database operations, including join operations between tables, wherein the database server component is configured to perform a particular join operation between at least a first table and a second table by; generating a bloom filter based upon one or more attributes of the second table; identifying one or more locations of one or more data blocks in which the data storage system stores data for the first table; wherein the data storage system comprises one or more storage devices upon which data for tables, including the first table, is stored in a plurality of block structures, including the one or more data blocks; sending to the data storage system, over the one or more network interfaces; a) an input/output (I/O) request identifying the one or more locations of the one or more data blocks stored in the data storage system; and b) the bloom filter, associated with the I/O request; wherein the I/O request, is a communication that, when interpreted by the data storage system, causes; the data storage system reading said one or more data blocks from the one or more storage devices; the data storage system generating a filtered response to the I/O request from the one or more data blocks by removing data block rows that do not apply to the bloom filter; and sending to the data storage system, over the one or more network interfaces, metadata indicating to the data storage system one or more data block row fields, of the one or more data blocks, that correspond to one or more particular columns of the first table, wherein removing data block rows that do not apply to the bloom filter comprises comparing results of one or more functions of the values of the one or more data block row fields to the bloom filter; in response to the I/O request, receiving the filtered response from the data storage system over the one or more network interfaces; and performing the particular join operation based on the filtered response.
-
-
54. An apparatus comprising:
-
one or more storage devices, upon which is stored a plurality of block structures; one or more network interfaces configured to receive and send responses to I/O requests from one or more database servers that store data for table structures within the plurality of block structures; a database storage server component, implemented at least partially by hardware, configured to process the I/O requests, wherein the data storage server component is configured to process a particular I/O request by; receiving the particular I/O request from a particular database server, the particular I/O request specifying particular locations of a plurality of data blocks; receiving, in association with the particular I/O request, a bloom filter; receiving metadata indicating one or more data block row fields, of the plurality of data blocks, that correspond to one or more particular columns against which the bloom filter is to be tested; reading the plurality of data blocks from the one or more storage devices; generating filtered response data from the plurality of data blocks by removing data block rows that do not apply to the bloom filter, wherein removing data block rows that do not apply to the bloom filter comprises comparing results of one or more functions of the values of the one or more data block row fields to the bloom filter; and sending the filtered response data to the particular database server.
-
Specification