Run-length encoding decompression
First Claim
1. A method comprising:
- receiving an instruction to decompress a run-length-encoded (RLE) value;
wherein the RLE value comprises a plurality of bits;
wherein each bit, of the RLE value, corresponds to a corresponding run length of a plurality of run lengths; and
in response to receiving the instruction to decompress the RLE value, performing a set of actions during both of a first pipelined execution stage and a second pipelined execution stage;
wherein the first pipelined execution stage comprises;
replicating each bit, of the RLE value, a number of times indicated by the corresponding run length that corresponds to the bit, to produce a respective decompressed sub-value of a plurality of decompressed sub-values that are based on the RLE value; and
wherein at least one plurality of bits, from the RLE value, are replicated in parallel;
concatenating the decompressed sub-values, of the plurality of decompressed sub-values, in parallel based on the order of the bits within the RLE value to which the decompressed sub-values correspond, to produce two or more decompressed intermediate values;
wherein the second pipelined execution stage comprises concatenating the two or more decompressed intermediate values to produce a decompressed value;
storing the decompressed value as a result for the instruction to decompress the RLE value;
wherein the method is performed by one or more computing devices.
1 Assignment
0 Petitions
Accused Products
Abstract
Approaches are described to improve database performance by implementing a RLE decompression function at a low level within a general-purpose processor or an external block. Specifically, embodiments of a hardware implementation of an instruction for RLE decompression are disclosed. The described approaches improve performance by supporting the RLE decompression function within a processor and/or external block. Specifically, a RLE decompression hardware implementation is disclosed that produces a 64-bit RLE decompression result, with an example embodiment performing the task in two pipelined execution stages with a throughput of one per cycle. According to embodiments, hardware organization of narrow-width shifters operating in parallel, controlled by computed shift counts, is used to perform the decompression. Because of the decreased time required to perform RLE decompression according to embodiments, the performance of tasks that use embodiments described herein for decompression of run-length encoded data is made more efficient.
-
Citations
23 Claims
-
1. A method comprising:
-
receiving an instruction to decompress a run-length-encoded (RLE) value; wherein the RLE value comprises a plurality of bits; wherein each bit, of the RLE value, corresponds to a corresponding run length of a plurality of run lengths; and in response to receiving the instruction to decompress the RLE value, performing a set of actions during both of a first pipelined execution stage and a second pipelined execution stage; wherein the first pipelined execution stage comprises; replicating each bit, of the RLE value, a number of times indicated by the corresponding run length that corresponds to the bit, to produce a respective decompressed sub-value of a plurality of decompressed sub-values that are based on the RLE value; and wherein at least one plurality of bits, from the RLE value, are replicated in parallel; concatenating the decompressed sub-values, of the plurality of decompressed sub-values, in parallel based on the order of the bits within the RLE value to which the decompressed sub-values correspond, to produce two or more decompressed intermediate values; wherein the second pipelined execution stage comprises concatenating the two or more decompressed intermediate values to produce a decompressed value; storing the decompressed value as a result for the instruction to decompress the RLE value; wherein the method is performed by one or more computing devices. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. One or more non-transitory computer-readable media storing one or more sequences of instructions, which, when executed by one or more processors, cause:
-
receiving an instruction to decompress a run-length-encoded (RLE) value; wherein the RLE value comprises a plurality of bits; wherein each bit, of the RLE value, corresponds to a corresponding run length of a plurality of run lengths; and in response to receiving the instruction to decompress the RLE value, performing a set of actions during both of a first pipelined execution stage and a second pipelined execution stage; wherein the first pipelined execution stage comprises; replicating each bit, of the RLE value, a number of times indicated by the corresponding run length that corresponds to the bit, to produce a respective decompressed sub-value of a plurality of decompressed sub-values that are based on the RLE value; and wherein at least one plurality of bits, from the RLE value, are replicated in parallel; concatenating the decompressed sub-values, of the plurality of decompressed sub-values, in parallel based on the order of the bits within the RLE value to which the decompressed sub-values correspond, to produce two or more decompressed intermediate values; wherein the second pipelined execution stage comprises concatenating the two or more decompressed intermediate values to produce a decompressed value; storing the decompressed value as a result for the instruction to decompress the RLE value. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A computing device comprising:
-
one or more processors; one or more first hardware components configured to perform a first set of actions comprising a first pipelined execution stage for decompressing run-length encoded (RLE) values; one or more second hardware components configured to perform a second set of actions comprising a second pipelined execution stage for decompressing RLE values; and one or more computer-readable media configured to store decompressed values based on RLE values; wherein the first set of actions comprises; replicating each bit, of an RLE value that comprises a plurality of bits, a number of times indicated by a corresponding run length that corresponds to the bit, to produce a respective decompressed sub-value of a plurality of decompressed sub-values that are based on the RLE value, and wherein at least one plurality of bits, from the RLE value, are replicated in parallel; concatenating the decompressed sub-values, of the plurality of decompressed sub-values, in parallel based on the order of the bits within the RLE value to which the decompressed sub-values correspond, to produce two or more decompressed intermediate values; wherein the second set of actions comprises; concatenating the two or more decompressed intermediate values to produce a decompressed value; and storing the decompressed value in the one or more computer-readable media.
-
Specification