Efficient query processing in columnar databases using bloom filters
First Claim
Patent Images
1. A distributed data warehouse system, comprising:
- a plurality of nodes, wherein at least some nodes of the plurality of nodes each comprise;
storage for a columnar database table, wherein said storage comprises a plurality of data blocks; and
a data access module;
the data access module, configured to;
generate a probabilistic data structure for each of one or more data blocks storing data for a column of the columnar database table, wherein each probabilistic data structure indicates data values not stored in the data block;
receive an indication of a query directed to the column of the columnar database table for select data; and
in response to the receipt of the indication of the query, examine the probabilistic data structure for each of the one or more data blocks storing data for the column to determine particular ones of the one or more data blocks which do not need to be read in order to service the query for the select data.
0 Assignments
0 Petitions
Accused Products
Abstract
A bloom filter is generated for efficient query processing for unsorted data in a column of a columnar database. Bloom filters represented as bitmaps are generated for data blocks storing data for a column of a columnar database table. An indication of a query directed toward the column is received and the bloom filter for each data block is examined to determine which ones of the data blocks do not need to be read in order to service the query for the select data. Data is then read from the data blocks storing data for the column excepting the ones which do not need to be read.
-
Citations
20 Claims
-
1. A distributed data warehouse system, comprising:
-
a plurality of nodes, wherein at least some nodes of the plurality of nodes each comprise; storage for a columnar database table, wherein said storage comprises a plurality of data blocks; and a data access module; the data access module, configured to; generate a probabilistic data structure for each of one or more data blocks storing data for a column of the columnar database table, wherein each probabilistic data structure indicates data values not stored in the data block; receive an indication of a query directed to the column of the columnar database table for select data; and in response to the receipt of the indication of the query, examine the probabilistic data structure for each of the one or more data blocks storing data for the column to determine particular ones of the one or more data blocks which do not need to be read in order to service the query for the select data. - View Dependent Claims (2, 3, 4, 6, 7)
-
-
5. A method, comprising:
performing, by one or more computing devices; generating a probabilistic data structure for each of one or more data blocks storing data for a column of a columnar database table, wherein each probabilistic data structure indicates data values not stored in the data block; receiving an indication of a query directed to the column of the columnar database table for select data; and in response to receiving the indication of the query, examining the probabilistic data structure for each of the one or more data blocks storing data for the column to determine particular ones of the one or more data blocks which do not need to be read in order to service the query for the select data. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15)
-
16. A non-transitory, computer-readable storage medium, storing program instructions that when executed by one or more computing devices implement:
-
generating a probabilistic data structure for each of one or more data blocks storing data for a column of a columnar database table, wherein each probabilistic data structure indicates data values not stored in the data block; for each of the one or more data blocks, storing the probabilistic data structure in a respective entry for the data block in a block metadata data structure that stores information about the one or more data blocks; receiving an indication of a query directed to the column of the columnar database table for select data; and in response to receiving the indication of the query; analyzing the probabilistic data structure for each of the one or more data blocks storing data for the column to determine particular ones of the one or more data blocks which do not need to be read in order to service the query for the select data; and reading the one or more data blocks storing data for the column in order to service the query for the select data excepting the particular ones of the one or more data blocks which do not need to be read. - View Dependent Claims (17, 18, 19, 20)
-
Specification