Histogram generation on multiple dimensions
First Claim
1. A data store system comprising:
- an array of persistent storage devices configured to store a plurality of data tables;
a processor in communication with the persistent storage devices, the processor configured to;
receive a request to generate a multi-dimensional histogram on at least one of the plurality of data tables;
identify a multi-dimensional dataset stored in the at least one of the plurality of data tables;
identify each dimension of the multi-dimensional dataset;
for each respective identified dimension;
sort the identified multi-dimensional dataset on values of the respective identified dimension;
for each respective identified dimension;
partition the sorted dataset into a predetermined number of intervals associated with the respective identified dimension;
determine a number of rows for each interval; and
select a lower boundary value and an upper boundary value for each interval, wherein the upper boundary value is the highest value in each interval, wherein the lower boundary value for an interval having lowest sorted values is the lowest value in the interval, and the lower boundary value for each other interval is the upper boundary value of an interval having immediately preceding partitioned values;
store the lower boundary values, upper boundary values, and number of rows for each interval of each identified dimension as the histogram;
receive a query on the at least one of the plurality of data tables; and
generate a query response based on the histogram.
1 Assignment
0 Petitions
Accused Products
Abstract
Based on a request, a processor may identify a multi-dimensional dataset stored in the at least one of a plurality of data tables and identify each dimension of the multi-dimensional dataset. For each respective identified dimension, the processor may sort the identified dataset on values of the respective identified dimension, partition the sorted dataset into a predetermined number of intervals associated with the respective identified dimension, determine a number of rows for each interval and select a lower boundary value and an upper boundary value for each interval. The upper boundary value may be the highest value in each interval. The lower boundary value for an interval having lowest sorted values may be the lowest value in the interval or the upper boundary value of an interval having immediately preceding partitioned values. The processor may further store the boundary values and rows for each interval of each identified dimension as the histogram. A method and computer-readable medium are also disclosed.
42 Citations
17 Claims
-
1. A data store system comprising:
-
an array of persistent storage devices configured to store a plurality of data tables; a processor in communication with the persistent storage devices, the processor configured to; receive a request to generate a multi-dimensional histogram on at least one of the plurality of data tables; identify a multi-dimensional dataset stored in the at least one of the plurality of data tables; identify each dimension of the multi-dimensional dataset; for each respective identified dimension; sort the identified multi-dimensional dataset on values of the respective identified dimension; for each respective identified dimension; partition the sorted dataset into a predetermined number of intervals associated with the respective identified dimension; determine a number of rows for each interval; and select a lower boundary value and an upper boundary value for each interval, wherein the upper boundary value is the highest value in each interval, wherein the lower boundary value for an interval having lowest sorted values is the lowest value in the interval, and the lower boundary value for each other interval is the upper boundary value of an interval having immediately preceding partitioned values; store the lower boundary values, upper boundary values, and number of rows for each interval of each identified dimension as the histogram; receive a query on the at least one of the plurality of data tables; and generate a query response based on the histogram. - View Dependent Claims (2, 3, 4, 5, 6, 12)
-
-
7. A computer-implemented method comprising:
-
receiving a request to generate a multi-dimensional histogram on at least one of a plurality of data tables stored in an array of persistent storage devices; retrieving the at least one of the plurality of data tables; identifying a multi-dimensional dataset stored in the at least one of the plurality of data tables; identifying each dimension of the multi-dimensional dataset; for each respective identified dimension; sorting the identified multi-dimensional dataset on values of the respective identified dimension; partitioning the sorted dataset into a predetermined number of intervals associated with the respective identified dimension; determining a number of rows for each interval; and selecting a lower boundary value and an upper boundary value for each interval, wherein the upper boundary value is the highest value in each interval, wherein the lower boundary value for an interval having lowest sorted values is the lowest value in the interval, and the lower boundary value for each other interval is the upper boundary value of an interval having immediately preceding partitioned values; storing the lower boundary values, upper boundary values, and number of rows for each interval of each identified dimension as the histogram; receiving a query on the at least one of the plurality of data tables; and generating a query response based on the histogram. - View Dependent Claims (8, 9, 10, 11)
-
-
13. A non-transitory computer-readable medium encoded with a plurality of instruction executable by a processor, the plurality of instructions comprising:
-
instructions to receive a request to generate a multi-dimensional histogram on at least one of a plurality of data tables stored in an array of persistent storage devices; instructions to retrieve the at least of the plurality of data tables; instructions to identify a multi-dimensional dataset stored in the at least one of the plurality of data tables; instructions to identify each dimension of the multi-dimensional dataset; for each respective identified dimension; instructions to sort the identified multi-dimensional dataset on values of the respective identified dimension; instructions to partition the sorted dataset into a predetermined number of intervals associated with the respective identified dimension; instructions to determine a number of rows for each interval; and instructions to select a lower boundary value and an upper boundary value for each interval, wherein the upper boundary value is the highest value in each interval, wherein the lower boundary value for an interval having lowest sorted values is the lowest value in the interval, and the lower boundary value for each other interval is the upper boundary value of an interval having immediately preceding partitioned values; instructions to store the lower boundary values, upper boundary values, and number of rows for each interval of each identified dimension as the histogram; instructions to receive a query on the at least one of the plurality of data tables; and instructions to generate a query response based on the histogram. - View Dependent Claims (14, 15, 16, 17)
-
Specification