Efficient hardware instructions for single instruction multiple data processors
First Claim
1. A processor configured to:
- load a bit vector into a first register that resides in the processor;
load each run-length value, in a vector of run-length values, into a corresponding subregister of a series of subregisters in a SIMD register that resides in the processor;
respond to one or more instructions by decompressing bits in the bit vector into a second register in the processor;
wherein bits in the bit vector are contiguous;
wherein run-length values within the vector of run-length values are contiguous;
wherein decompressing bits in the bit vector includes;
for one or more run-length values in the vector of run-length values in the series of subregisters, copying the corresponding bit in the bit vector in the first register into the second register according to the run-length value, such that copies of the corresponding bit are contiguously stored in the second register as indicated by the run-length value.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus for efficiently processing data in various formats in a single instruction multiple data (“SIMD”) architecture is presented. Specifically, a method to unpack a fixed-width bit values in a bit stream to a fixed width byte stream in a SIMD architecture is presented. A method to unpack variable-length byte packed values in a byte stream in a SIMD architecture is presented. A method to decompress a run length encoded compressed bit-vector in a SIMD architecture is presented. A method to return the offset of each bit set to one in a bit-vector in a SIMD architecture is presented. A method to fetch bits from a bit-vector at specified offsets relative to a base in a SIMD architecture is presented. A method to compare values stored in two SIMD registers is presented.
-
Citations
20 Claims
-
1. A processor configured to:
-
load a bit vector into a first register that resides in the processor; load each run-length value, in a vector of run-length values, into a corresponding subregister of a series of subregisters in a SIMD register that resides in the processor; respond to one or more instructions by decompressing bits in the bit vector into a second register in the processor; wherein bits in the bit vector are contiguous; wherein run-length values within the vector of run-length values are contiguous; wherein decompressing bits in the bit vector includes; for one or more run-length values in the vector of run-length values in the series of subregisters, copying the corresponding bit in the bit vector in the first register into the second register according to the run-length value, such that copies of the corresponding bit are contiguously stored in the second register as indicated by the run-length value. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A processor:
-
wherein each run-length value in a vector of run-length values represents how many bits each bit in a bit vector should be decompressed to, respectively; wherein bits in the bit vector are contiguously stored in memory; wherein values within the vector of run-length values are contiguously stored in memory; wherein results are stored in memory; wherein the processor is configured to, in response to one or more instructions; copy a set of bits in the bit vector from memory into a first register; and copy a particular number of run-length values from the vector of run-length values from memory into a series of subregisters in a SIMD register, wherein each run-length value is copied into a different subregister in the series of subregisters; wherein each bit in the set of bits has a corresponding run-length value in a subregister in the series of subregisters; decompress each bit, such that copies of each bit are contiguously-stored in a second register as is indicated by the corresponding run-length value in the subregister of the series of subregisters. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A method comprising:
-
loading a bit vector into a first register; loading each run-length value, in a vector of run-length values, into a corresponding subregister of a series of subregisters in a SIMD register; responding to one or more instructions by decompressing bits in the bit vector into a second register; wherein bits in the bit vector are contiguous; wherein run-length values within the vector of run-length values are contiguous; wherein decompressing bits in the bit vector includes; for one or more run-length values in the vector of run-length values in the series of subregisters, copying the corresponding bit in the bit vector in the first register into the second register according to the run-length value, such that copies of the corresponding bit are contiguously stored in the second register as indicated by the run-length value. - View Dependent Claims (12, 13, 14, 15)
-
-
16. A method:
-
wherein each run-length value in a vector of run-length values represents how many bits each bit in a bit vector should be decompressed to, respectively; wherein bits in the bit vector are contiguously stored in memory; wherein values within the vector of run-length values are contiguously stored in memory; wherein results are stored in memory; wherein the method comprises, in response to one or more instructions; copying a set of bits in the bit vector from memory into a first register; and copying a particular number of run-length values from the vector of run-length values from memory into a series of subregisters in a SIMD register, wherein each run-length value is copied into a different subregister in the series of subregisters; wherein each bit in the set of bits has a corresponding run-length value in a subregister in the series of subregisters; decompressing each bit, such that copies of each bit are contiguously-stored in a second register as is indicated by the corresponding run-length value in the subregister of the series of subregisters. - View Dependent Claims (17, 18, 19, 20)
-
Specification