Compression analyzer
First Claim
1. A method comprising:
- prior to storing a set of data in a table, performing the steps of;
obtaining data that indicates a selected balance point along a spectrum between maximum performance and maximum compression;
wherein the selected balance point is one of a plurality of user-selectable balance points along the spectrum between maximum performance and maximum compression;
wherein the plurality of user-selectable balance points include at least one balance point that corresponds to neither maximum performance nor maximum compression;
performing a plurality of tests on a subset of data from the set of data to determine compression ratios produced by applying each of a plurality of compression techniques to the subset of data;
wherein the table has a plurality of columns;
wherein the set of data includes a plurality of rows;
wherein, in a first test of the plurality of tests, the plurality of rows are not sorted;
wherein, in a second test of the plurality of tests, the plurality of rows are sorted based on a particular column of the plurality of columns;
selecting a particular compression technique to apply to the set of data based, at least in part, on;
the compression ratios produced by each of the plurality of compression techniques; and
the selected balance point;
compressing the set of data using the particular compression technique to produce compressed data; and
storing the compressed data in the table;
wherein the method is performed by one or more computing devices.
0 Assignments
0 Petitions
Accused Products
Abstract
Techniques are described herein for automatically selecting the compression techniques to be used on tabular data. A compression analyzer gives users high-level control over the selection process without requiring the user to know details about the specific compression techniques that are available to the compression analyzer. Users are able to specify, for a given set of data, a “balance point” along the spectrum between “maximum performance” and “maximum compression”. The point thus selected is used by the compression analyzer in a variety of ways. For example, in one embodiment, the compression analyzer uses the user-specified balance point to determine which of the available compression techniques qualify as “candidate techniques” for the given set of data. The compression analyzer selects the compression technique to use on a set of data by actually testing the candidate compression techniques against samples from the set of data. After testing the candidate compression techniques against the samples, the resulting compression ratios are compared. The compression technique to use on the set of data is then selected based, in part, on the compression ratios achieved during the compression tests performed on the sample data.
-
Citations
18 Claims
-
1. A method comprising:
-
prior to storing a set of data in a table, performing the steps of; obtaining data that indicates a selected balance point along a spectrum between maximum performance and maximum compression; wherein the selected balance point is one of a plurality of user-selectable balance points along the spectrum between maximum performance and maximum compression; wherein the plurality of user-selectable balance points include at least one balance point that corresponds to neither maximum performance nor maximum compression; performing a plurality of tests on a subset of data from the set of data to determine compression ratios produced by applying each of a plurality of compression techniques to the subset of data; wherein the table has a plurality of columns; wherein the set of data includes a plurality of rows; wherein, in a first test of the plurality of tests, the plurality of rows are not sorted; wherein, in a second test of the plurality of tests, the plurality of rows are sorted based on a particular column of the plurality of columns; selecting a particular compression technique to apply to the set of data based, at least in part, on; the compression ratios produced by each of the plurality of compression techniques; and the selected balance point; compressing the set of data using the particular compression technique to produce compressed data; and storing the compressed data in the table; wherein the method is performed by one or more computing devices. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method comprising:
-
prior to storing a set of data in a table, performing the steps of; obtaining data that indicates a selected balance point along a spectrum between maximum performance and maximum compression; wherein the selected balance point is one of a plurality of user-selectable balance points along the spectrum between maximum performance and maximum compression; wherein the plurality of user-selectable balance points include at least one balance point that corresponds to neither maximum performance nor maximum compression; wherein each of the plurality of user-selectable balance points corresponds to a candidate pool of compression techniques; performing a plurality of tests on a subset of data from the set of data to determine compression ratios produced by applying each of a plurality of compression techniques to the subset of data; wherein only those compression techniques that belong to the candidate pool of the selected balance point are tested during the plurality of tests; selecting a particular compression technique to apply to the set of data based, at least in part, on; the compression ratios produced by each of the plurality of compression techniques; and the selected balance point; compressing the set of data using the particular compression technique to produce compressed data; and storing the compressed data in the table; wherein the method is performed by one or more computing devices.
-
-
9. A method comprising:
-
prior to storing a set of data in a table, performing the steps of; obtaining data that indicates a selected balance point along a spectrum between maximum performance and maximum compression; wherein the selected balance point is one of a plurality of user-selectable balance points along the spectrum between maximum performance and maximum compression; wherein the plurality of user-selectable balance points include at least one balance point that corresponds to neither maximum performance nor maximum compression; performing a plurality of tests on a subset of data from the set of data to determine compression ratios produced by applying each of a plurality of compression techniques to the subset of data; selecting a particular compression technique to apply to the set of data based, at least in part, on; the compression ratios produced by each of the plurality of compression techniques; and the selected balance point; compressing the set of data using the particular compression technique to produce compressed data; and storing the compressed data in the table; wherein the method is performed by one or more computing devices; wherein the table has a plurality of columns; wherein the set of data includes a plurality of rows; wherein, in a first test of the plurality of tests, the plurality of rows are sorted based on a first column of the plurality of columns, wherein the first test applies a particular sequence of one or more compression techniques to data for the first column of the plurality of columns; and wherein, in a second test of the plurality of tests, the plurality of rows are sorted based on a second column of the plurality of columns, wherein the second test applies said particular sequence of one or more compression techniques to the data for the first column of the plurality of columns.
-
-
10. A non-transitory computer-readable storage storing instructions which, when executed by one or more processors, cause:
-
prior to storing a set of data in a table, performing the steps of; obtaining data that indicates a selected balance point along a spectrum between maximum performance and maximum compression; wherein the selected balance point is one of a plurality of user-selectable balance points along the spectrum between maximum performance and maximum compression; wherein the plurality of user-selectable balance points include at least one balance point that corresponds to neither maximum performance nor maximum compression; performing a plurality of tests on a subset of data from the set of data to determine compression ratios produced by applying each of a plurality of compression techniques to the subset of data; wherein the table has a plurality of columns; wherein the set of data includes a plurality of rows; wherein, in a first test of the plurality of tests, the plurality of rows are not sorted; wherein, in a second test of the plurality of tests, the plurality of rows are sorted based on a particular column of the plurality of columns; selecting a particular compression technique to apply to the set of data based, at least in part, on; the compression ratios produced by each of the plurality of compression techniques; and the selected balance point; compressing the set of data using the particular compression technique to produce compressed data; and storing the compressed data in the table. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
-
18. A non-transitory computer-readable storage storing instructions which, when executed by one or more processors, cause:
-
prior to storing a set of data in a table, performing the steps of; obtaining data that indicates a selected balance point along a spectrum between maximum performance and maximum compression; wherein the selected balance point is one of a plurality of user-selectable balance points along the spectrum between maximum performance and maximum compression; wherein the plurality of user-selectable balance points include at least one balance point that corresponds to neither maximum performance nor maximum compression; performing a plurality of tests on a subset of data from the set of data to determine compression ratios produced by applying each of a plurality of compression techniques to the subset of data; selecting a particular compression technique to apply to the set of data based, at least in part, on; the compression ratios produced by each of the plurality of compression techniques; and the selected balance point; compressing the set of data using the particular compression technique to produce compressed data; and storing the compressed data in the table; wherein the table has a plurality of columns; wherein the set of data includes a plurality of rows; wherein, in a first test of the plurality of tests, the plurality of rows are sorted based on a first column of the plurality of columns, wherein the first test applies a particular sequence of one or more compression techniques to data for the first column of the plurality of columns; and wherein, in a second test of the plurality of tests, the plurality of rows are sorted based on a second column of the plurality of columns, wherein the second test applies said particular sequence of one or more compression techniques to the data for the first column of the plurality of columns.
-
Specification