Efficient query processing in columnar databases using bloom filters
First Claim
Patent Images
1. A distributed data warehouse system, comprising:
- a plurality of nodes, wherein at least some nodes of the plurality of nodes each comprise;
storage for a columnar database table, wherein said storage comprises a plurality of data blocks;
a bloom filter generator, configured to;
generate a bloom filter for each of one or more data blocks storing data for a column of the columnar database table, wherein each bloom filter is represented as a bitmap, wherein different patterns of set bits in the bitmap indicate data values not stored in the data block;
a read module;
a query engine, configured to;
receive an indication of a query directed to the column of the columnar database table for select data;
evaluate the indication of the query to determine one or more predicate data values that identify the select data;
in response to receiving and evaluating the indication of the query;
analyze the bitmap representing the bloom filter for the one or more predicate data values for each of the one or more data blocks to determine particular ones of the one or more data blocks which do not need to be read in order to service the query for the select data; and
direct the read module to read the one or more data blocks storing data for the column excepting the particular ones of the one or more data blocks which do not need to be read.
1 Assignment
0 Petitions
Accused Products
Abstract
A bloom filter is generated for efficient query processing for unsorted data in a column of a columnar database. Bloom filters represented as bitmaps are generated for data blocks storing data for a column of a columnar database table. An indication of a query directed toward the column is received and the bloom filter for each data block is examined to determine which ones of the data blocks do not need to be read in order to service the query for the select data. Data is then read from the data blocks storing data for the column excepting the ones which do not need to be read.
-
Citations
20 Claims
-
1. A distributed data warehouse system, comprising:
a plurality of nodes, wherein at least some nodes of the plurality of nodes each comprise; storage for a columnar database table, wherein said storage comprises a plurality of data blocks; a bloom filter generator, configured to; generate a bloom filter for each of one or more data blocks storing data for a column of the columnar database table, wherein each bloom filter is represented as a bitmap, wherein different patterns of set bits in the bitmap indicate data values not stored in the data block; a read module; a query engine, configured to; receive an indication of a query directed to the column of the columnar database table for select data; evaluate the indication of the query to determine one or more predicate data values that identify the select data; in response to receiving and evaluating the indication of the query; analyze the bitmap representing the bloom filter for the one or more predicate data values for each of the one or more data blocks to determine particular ones of the one or more data blocks which do not need to be read in order to service the query for the select data; and direct the read module to read the one or more data blocks storing data for the column excepting the particular ones of the one or more data blocks which do not need to be read. - View Dependent Claims (2, 3, 4)
-
5. A method, comprising:
performing, by one or more computing devices; generating a bloom filter for each of one or more data blocks storing data for a column of a columnar database table, wherein each bloom filter is represented as a bitmap, wherein different patterns of set bits in the bitmap indicate data values not stored in the data block; receiving an indication of a query directed to the column of the columnar database table for select data; and in response to receiving the indication of the query, examining the bloom filter for each of the one or more data blocks storing data for the column to determine particular ones of the one or more data blocks which do not need to be read in order to service the query for the select data. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
16. A non-transitory, computer-readable storage medium, storing program instructions that when executed by one or more computing devices implement:
-
generating a bloom filter for each of one or more data blocks storing data for a column of a columnar database table, wherein each bloom filter is represented as a bitmap, wherein different patterns of set bits in the bitmap indicate data values not stored in the data block; for each of the one or more data blocks, storing the bitmap representing the bloom filter in a respective entry for the data block in a block metadata data structure that stores information about the one or more data blocks; receiving an indication of a query directed to the column of the columnar database table for select data; and in response to receiving the indication of the query; analyzing the bloom filter for each of the one or more data blocks storing data for the column to determine particular ones of the one or more data blocks which do not need to be read in order to service the query for the select data; and reading the one or more data blocks storing data for the column in order to service the query for the select data excepting the particular ones of the one or more data blocks which do not need to be read. - View Dependent Claims (17, 18, 19, 20)
-
Specification