EFFICIENT LARGE-SCALE PROCESSING OF COLUMN BASED DATA ENCODED STRUCTURES
First Claim
1. A method for processing data, comprising:
- in response to a query, receiving a subset of the data as integer encoded and compressed sequences of values corresponding to different columns of the data;
defining processing buckets that span over the subset of the data received as integer encoded and compressed sequences of values based on changes of compression type occurring in any of the integer encoded and compressed sequences of values of the subset of data; and
performing operations defined by the query based on type of current bucket processed when processing the integer encoded and compressed sequences of values to return query results.
2 Assignments
0 Petitions
Accused Products
Abstract
The subject disclosure relates to efficient query processing over large scale data storage. An exemplary process includes retrieving a subset of columns implicated by a query as integer encoded and compressed sequences of values corresponding to different columns of data, defining query processing buckets that span over the subset of columns based on changes of compression type occurring in the integer encoded and compressed sequences of values of the subset of data and processing the query in memory on a bucket by bucket basis and processing the query based on type of current bucket when processing the integer encoded and compressed sequences of values. The column based organization of the data, and the application of a hybrid run length encoding and bit packing technique, enable a highly efficient and speedy query response in real-time.
-
Citations
21 Claims
-
1. A method for processing data, comprising:
-
in response to a query, receiving a subset of the data as integer encoded and compressed sequences of values corresponding to different columns of the data; defining processing buckets that span over the subset of the data received as integer encoded and compressed sequences of values based on changes of compression type occurring in any of the integer encoded and compressed sequences of values of the subset of data; and performing operations defined by the query based on type of current bucket processed when processing the integer encoded and compressed sequences of values to return query results. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A device for processing data, comprising:
-
high speed in memory storage for storing a subset of data received as integer encoded and compressed sequences of values corresponding to different fields of the data; and at least one query processor that processes the query over the subset of the data according to a bucket walking process that defines buckets across the sequences of values based on compression algorithm transitions from run length encoding to bit packing, or vice versa, and then processes the query over the subset of data bucket by bucket according to a type of bucket determined for a current bucket being processed based on the types of compression applied across the sequences for the current bucket. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
-
21. Data querying apparatus, comprising:
-
means for retrieving a subset of columns implicated by a query as integer encoded and compressed sequences of values corresponding to different columns of data; means for defining query processing buckets that span over the subset of columns based on changes of compression type occurring in the integer encoded and compressed sequences of values of the subset of data; and means for processing the query in memory on a bucket by bucket basis and processing the query based on type of current bucket when processing the integer encoded and compressed sequences of values.
-
Specification