Hash join using collaborative parallel filtering in intelligent storage with offloaded bloom filters
First Claim
1. A system comprising:
- one or more storage devices storing raw data in data block structures;
a storage server configured to respond to input/output (I/O) requests directly from one or more database servers, but not to respond to database commands received directly from database clients;
wherein the I/O requests are requests for raw data from specified data block structures;
wherein the storage server is configured to respond to the I/O requests from the one or more database servers by reading data blocks from or writing data blocks to the one or more storage devices; and
a database server, connected via a network to the storage server;
wherein the storage server acts as an intermediary for all data transfers between the database server and the one or more storage devices;
wherein the database server is configured to respond to database commands received directly from one or more database clients, but the database server is not capable of directly responding to I/O requests for raw data;
wherein the database server is configured to perform;
identifying one or more data blocks, of the data block structures stored on the one or more storage devices, that store raw data that represents a first table;
generating join metadata based upon one or more attributes of a second table;
sending to the data storage system;
a particular I/O request for the one or more data blocks, anddata indicating the join metadata;
wherein the storage server is further configured to perform;
receiving the particular I/O request for the one or more data blocks;
reading the one or more data blocks from the one or more storage devices;
identifying a portion of the raw data in the one or more data blocks that represents table rows that are guaranteed, based on the join metadata, to not be needed for a join operation with the second table; and
returning, to the database server, a filtered version of the one or more data blocks without at least the identified portion;
wherein the database server is further configured to perform;
receiving the filtered version of the one or more data blocks from the storage server; and
a join operation between the first table and the second table based on the filtered version of the one or more data blocks.
0 Assignments
0 Petitions
Accused Products
Abstract
Processing resources at a storage system for a database server are utilized to perform aspects of a join operation that would conventionally be performed by the database server. When requesting a range of data units from a storage system, the database server includes join metadata describing aspects of the join operation for which the data is being requested. The join metadata may be, for instance, a bloom filter. The storage system reads the requested data from disk as normal. However, prior to sending the requested data back to the storage system, the storage system analyzes the raw data based on the join metadata, removing a certain amount of data that is guaranteed to be irrelevant to the join operation. The storage system then returns filtered data to the database server. The database system thereby avoids the unnecessary transfer of certain data between the storage system and the database server.
-
Citations
24 Claims
-
1. A system comprising:
-
one or more storage devices storing raw data in data block structures; a storage server configured to respond to input/output (I/O) requests directly from one or more database servers, but not to respond to database commands received directly from database clients; wherein the I/O requests are requests for raw data from specified data block structures; wherein the storage server is configured to respond to the I/O requests from the one or more database servers by reading data blocks from or writing data blocks to the one or more storage devices; and a database server, connected via a network to the storage server; wherein the storage server acts as an intermediary for all data transfers between the database server and the one or more storage devices; wherein the database server is configured to respond to database commands received directly from one or more database clients, but the database server is not capable of directly responding to I/O requests for raw data; wherein the database server is configured to perform; identifying one or more data blocks, of the data block structures stored on the one or more storage devices, that store raw data that represents a first table; generating join metadata based upon one or more attributes of a second table; sending to the data storage system; a particular I/O request for the one or more data blocks, and data indicating the join metadata; wherein the storage server is further configured to perform; receiving the particular I/O request for the one or more data blocks; reading the one or more data blocks from the one or more storage devices; identifying a portion of the raw data in the one or more data blocks that represents table rows that are guaranteed, based on the join metadata, to not be needed for a join operation with the second table; and returning, to the database server, a filtered version of the one or more data blocks without at least the identified portion; wherein the database server is further configured to perform; receiving the filtered version of the one or more data blocks from the storage server; and a join operation between the first table and the second table based on the filtered version of the one or more data blocks. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method comprising:
-
identifying, at a database server, one or more data blocks in which a data storage system stores raw data that represents a first table; wherein the database server is configured to respond to database commands from one or more database clients, but the database server is not capable of directly responding to input/output (I/O) requests that are requests for raw data; wherein the data storage system comprises a storage server that is configured to respond to I/O requests directly from one or more database servers, but not to respond to database commands received directly from database clients; wherein the storage server acts as an intermediary for all data transfers between the database server and one or more storage devices of the data storage system; generating join metadata based upon one or more attributes of a second table; the database server sending to the data storage system; an input/output (I/O) request for the one or more data blocks; and data indicating the join metadata; wherein the I/O request is a communication that, when interpreted by the data storage system, causes; the data storage system reading the one or more data blocks from the one or more storage devices; identifying a portion of the raw data in the one or more data blocks that represents table rows that are guaranteed, based on the join metadata, to not be needed for a join operation with the second table; and returning a filtered version of the one or more data blocks without at least the identified portion; in response to the I/O request, the database server receiving the filtered version of the one or more data blocks from the data storage system; the database server performing a join operation between the first table and the second table based on the filtered version of the one or more data blocks; wherein the method is performed by one or more computing devices. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. One or more non-transitory computer-readable storage media storing instructions that, when executed by one or more computing devices, cause:
-
identifying, at a database server, one or more data blocks in which a data storage system stores raw data that represents a first table; wherein the database server is configured to respond to database commands from one or more database clients, but the database server is not capable of directly responding to input/output (I/O) requests that are requests for raw data; wherein the data storage system comprises a storage server that is configured to respond to I/O requests directly from one or more database servers, but not to respond to database commands received directly from database clients; wherein the storage server acts as an intermediary for all data transfers between the database server and one or more storage devices of the data storage system; generating join metadata based upon one or more attributes of a second table; the database server sending to the data storage system; an input/output (I/O) request for the one or more data blocks; and data indicating the join metadata; wherein the I/O request is a communication that, when interpreted by the data storage system, causes; the data storage system reading the one or more data blocks from the one or more storage devices; identifying a portion of the raw data in the one or more data blocks that represents table rows that are guaranteed, based on the join metadata, to not be needed for a join operation with the second table; and returning a filtered version of the one or more data blocks without at least the identified portion; in response to the I/O request, the database server receiving the filtered version of the one or more data blocks from the data storage system; the database server performing a join operation between the first table and the second table based on the filtered version of the one or more data blocks. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
-
Specification