EFFICIENT LARGE-SCALE JOINING FOR QUERYING OF COLUMN BASED DATA ENCODED STRUCTURES
First Claim
1. A method for processing data, comprising:
- in response to a query implicating at least one join operation over data in at least one data store, receiving a subset of data as integer encoded and compressed sequences of values corresponding to different columns of the data in the at least one data store;
determining at least one result set for the at least one join operation including determining if a local cache includes any non-default values corresponding to columns implicated by the at least one join operation; and
where the local cache includes any non-default values corresponding to columns implicated by the at least one join operation, substituting the non-default values when determining the at least one result set.
2 Assignments
0 Petitions
Accused Products
Abstract
The subject disclosure relates to querying of column based data encoded structures enabling efficient query processing over large scale data storage, and more specifically, with respect to join operations. Initially, a compact structure is received that represents the data according to a column based organization, and various compression and data packing techniques, already enabling a highly efficient and fast query response in real-time. On top of already fast querying enabled by the compact column oriented structure, a scalable, fast algorithm is provided for query processing in memory, which constructs an auxiliary data structure, also column-oriented, for use in join operations, which further leverages characteristics of in-memory data processing and access, as well as the column-oriented characteristics of the compact data structure.
158 Citations
20 Claims
-
1. A method for processing data, comprising:
-
in response to a query implicating at least one join operation over data in at least one data store, receiving a subset of data as integer encoded and compressed sequences of values corresponding to different columns of the data in the at least one data store; determining at least one result set for the at least one join operation including determining if a local cache includes any non-default values corresponding to columns implicated by the at least one join operation; and where the local cache includes any non-default values corresponding to columns implicated by the at least one join operation, substituting the non-default values when determining the at least one result set. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for query processing, including:
-
generating a lazy cache shared by segments of compacted data retrieved in response to a query as integer encoded and compressed sequences of values corresponding to different columns of the data in at least one data store representing a set of tables; and in response to a query implicating at least one join operation over data in at least one data store, processing the query with reference to the lazy cache implicating at least one join operation over the at least one data store; wherein the processing includes populating the lazy cache with at least one data value from at least one table of the set of tables according to a predetermined algorithm for potential re-use of the at least one data value over the lifetime of the query processing. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A device for processing data, comprising:
-
high speed in memory storage for storing a subset of data received as integer encoded and compressed sequences of values corresponding to different columns of the data and for storing a vector of values corresponding to the different columns; and at least one query processor that processes the query over the subset of the data and that skips at least one join operation implicated by the query over the subset of data where a default value is found in the vector for a given column and substitutes a value of the vector for the at least one join operation instead.
-
Specification