Adaptive dictionary compression/decompression for column-store databases
First Claim
1. One or more non-transitory computer-readable media storing computer-executable instructions for causing a computing system, when programmed thereby, to perform operations comprising:
- evaluating at least some of multiple available compression variants to apply to a string dictionary for a column of a table in a column-store database, wherein the string dictionary maps distinct values among values of the column to value identifiers, each of the distinct values being a string of one or more characters, and wherein the evaluating uses compression models for the respective at least some of the multiple available compression variants, a given compression model of the compression models estimating compressed dictionary size of the string dictionary for a given compression variant of the multiple available compression variants without applying the given compression variant to the string dictionary;
selecting, based at least in part on results of the evaluating, one of the multiple available compression variants to apply to the string dictionary; and
applying the selected compression variant to the string dictionary, thereby reducing the compressed dictionary size of the string dictionary, including, for each of at least one of the distinct values of the string dictionary, replacing at least one character of the string for the distinct value with one or more codes that represent the replaced at least one character, the one or more codes being shorter than the replaced at least one character.
2 Assignments
0 Petitions
Accused Products
Abstract
Innovations for adaptive compression and decompression for dictionaries of a column-store database can reduce the amount of memory used for columns of the database, allowing a system to keep column data in memory for more columns, while delays for access operations remain acceptable. For example, dictionary compression variants use different compression techniques and implementation options. Some dictionary compression variants provide more aggressive compression (reduced memory consumption) but result in slower run-time performance. Other dictionary compression variants provide less aggressive compression (higher memory consumption) but support faster run-time performance. As another example, a compression manager can automatically select a dictionary compression variant for a given column in a column-store database. For different dictionary compression variants, the compression manager predicts run-time performance and compressed dictionary size, given the values of the column, and selects one of the dictionary compression variants.
32 Citations
34 Claims
-
1. One or more non-transitory computer-readable media storing computer-executable instructions for causing a computing system, when programmed thereby, to perform operations comprising:
-
evaluating at least some of multiple available compression variants to apply to a string dictionary for a column of a table in a column-store database, wherein the string dictionary maps distinct values among values of the column to value identifiers, each of the distinct values being a string of one or more characters, and wherein the evaluating uses compression models for the respective at least some of the multiple available compression variants, a given compression model of the compression models estimating compressed dictionary size of the string dictionary for a given compression variant of the multiple available compression variants without applying the given compression variant to the string dictionary; selecting, based at least in part on results of the evaluating, one of the multiple available compression variants to apply to the string dictionary; and applying the selected compression variant to the string dictionary, thereby reducing the compressed dictionary size of the string dictionary, including, for each of at least one of the distinct values of the string dictionary, replacing at least one character of the string for the distinct value with one or more codes that represent the replaced at least one character, the one or more codes being shorter than the replaced at least one character. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. In a computing system that implements a compression manager, a method comprising:
-
with the computing system, evaluating at least some of multiple available compression variants to apply to a string dictionary for a column of a table in a column-store database, wherein the evaluating includes estimating compressed dictionary size of the string dictionary according to a compression model for a given compression variant of the multiple available compression variants without applying the given compression variant to the string dictionary, wherein the string dictionary maps distinct values among values of the column to value identifiers, each of the distinct values being a string of one or more characters; with the computing system, selecting, based at least in part on results of the evaluating, one of the multiple available compression variants to apply to the string dictionary; and with the computing system, applying the selected compression variant to the string dictionary, thereby reducing the compressed dictionary size of the string dictionary, including, for each of at least one of the distinct values of the string dictionary, replacing at least one character of the string for the distinct value with one or more codes that represent the replaced at least one character, the one or more codes being shorter than the replaced at least one character. - View Dependent Claims (22, 23, 24, 25, 26)
-
-
27. A computing system comprising:
-
memory configured to store one or more tables for an in-memory column-store database; and one or more processors configured to execute instructions for a compression manager, wherein at least one of the one or more processors is selected from the group consisting of central processing unit, graphics processing unit, and application-specific integrated circuit, the compression manager being adapted to compress at least some columns of the one or more tables using domain encoding, wherein the compression manager is further adapted to; evaluate, in terms of compressed dictionary size and run-time performance, multiple available compression variants for a string dictionary for a given column of the at least some columns, wherein the run-time performance accounts for frequency of access of the given column, the compression manager being adapted to use compression models for the respective compression variants, a given compression model of the compression models estimating the compressed dictionary size of the string dictionary for a given compression variant of the multiple available compression variants without applying the given compression variant to the string dictionary, and wherein the string dictionary maps distinct values among values of the given column to value identifiers, each of the distinct values being a string of one or more characters; select one of the multiple available compression variants to apply to the string dictionary for the given column; and apply the selected compression variant to the string dictionary, thereby reducing the compressed dictionary size of the string dictionary, including, for each of at least one of the distinct values of the string dictionary, replacing at least one character of the string for the distinct value with one or more codes that represent the replaced at least one character, the one or more codes being shorter than the replaced at least one character. - View Dependent Claims (28, 29, 30, 31, 32, 33, 34)
-
Specification