Efficient query processing using histograms in a columnar database
First Claim
1. A system, comprising:
- one or more hardware processors and memory with program instructions to;
determine a bucket range size for each of a plurality of buckets for a histogram of a column of a columnar database table, wherein each bucket of the plurality of buckets represents an existence of one or more data values of the data in the column within a range of values according to the determined bucket range size;
generate a probabilistic data structure for each of one or more data blocks storing data for the column of the columnar database table, wherein the probabilistic data structure indicates for which particular buckets of the plurality of buckets in the histogram there is a data value stored in the data block; and
examine the probabilistic data structure, responsive to a query, for each of the one or more data blocks storing data for the column to determine ones of the one or more data blocks which do not need to be read in order to service the query.
0 Assignments
0 Petitions
Accused Products
Abstract
A probabilistic data structure is generated for efficient query processing using a histogram for unsorted data in a column of a columnar database. A bucket range size is determined for multiples buckets of a histogram of a column in a columnar database table. In at least some embodiments, the histogram may be a height-balanced histogram. A probabilistic data structure is generated to indicate for which particular buckets in the histogram there is a data value stored in the data block. When an indication of a query directed to the column for select data is received, the probabilistic data structure for each of the data blocks storing data for the column may be examined to determine particular ones of the data blocks which do not need to be read in order to service the query for the select data.
29 Citations
20 Claims
-
1. A system, comprising:
one or more hardware processors and memory with program instructions to; determine a bucket range size for each of a plurality of buckets for a histogram of a column of a columnar database table, wherein each bucket of the plurality of buckets represents an existence of one or more data values of the data in the column within a range of values according to the determined bucket range size; generate a probabilistic data structure for each of one or more data blocks storing data for the column of the columnar database table, wherein the probabilistic data structure indicates for which particular buckets of the plurality of buckets in the histogram there is a data value stored in the data block; and examine the probabilistic data structure, responsive to a query, for each of the one or more data blocks storing data for the column to determine ones of the one or more data blocks which do not need to be read in order to service the query. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
8. A method, comprising:
performing, by one or more computing devices comprising one or more hardware processors and memory; determining a bucket range size for each of a plurality of buckets for a histogram of a column of a columnar database table, wherein each bucket of the plurality of buckets represents an existence of one or more data values of the data in the column within a range of values according to the determined bucket range size; generating a probabilistic data structure for each of one or more data blocks storing data for the column of the columnar database table, wherein the probabilistic data structure indicates for which particular buckets of the plurality of buckets in the histogram there is a data value stored in the data block; and examining the probabilistic data structure, responsive to a query, for each of the one or more data blocks storing data for the column to determine ones of the one or more data blocks which do not need to be read in order to service the query. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
15. A non-transitory, computer-readable storage medium, storing program instructions that when executed by one or more computing devices implement:
-
determining a bucket range size for each of a plurality of buckets for a histogram of a column of a columnar database table, wherein each bucket of the plurality of buckets represents an existence of one or more data values of the data in the column within a range of values according to the determined bucket range size; generating a probabilistic data structure for each of one or more data blocks storing data for the column of the columnar database table, wherein the probabilistic data structure indicates for which particular buckets of the plurality of buckets in the histogram there is a data value stored in the data block; and examining the probabilistic data structure, responsive to a query, for each of the one or more data blocks storing data for the column to determine ones of the one or more data blocks which do not need to be read in order to service the query. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification