Instructions for efficiently accessing unaligned vectors
First Claim
1. A method for executing a load-swapped instruction, comprising:
- receiving the load-swapped instruction to be executed, wherein the load-swapped instruction specifies a source address in memory, which is arbitrarily aligned; and
executing the load-swapped instruction, which involves loading a vector from a naturally-aligned memory region encompassing the source address into a register, and in doing so, if the source address is unaligned, rotating the bytes of the vector by swapping a set of bytes residing at addresses lower than the source address with a set of bytes residing at addresses greater than or equal to the source address;
wherein rotating the bytes of the vector involves rotating the bytes N positions, where N is equivalent to either the source address specified by the instruction modulo the vector length in bytes or the source address specified by the instruction modulo the vector length in bytes subtracted from the vector length in bytes;
wherein rotating the bytes of the vector occurs before the vector reaches the register; and
wherein rotating the bytes of the vector involves using an alignment circuit which is located along a load-store path between the memory and the register to cause the byte at the specified source address to reside at the least-significant byte position within the vector for a little-endian memory transaction, or causing said byte to be positioned at the most-significant byte position within the vector for a big-endian memory transaction.
2 Assignments
0 Petitions
Accused Products
Abstract
One embodiment of the present invention provides a processor which is configured to execute load-swapped instructions, which are possibly directed to unaligned source address. The processor is configured to execute the load-swapped instruction by loading a vector from a naturally-aligned memory region encompassing the source address, and in doing so rotating the bytes of the vector to cause the byte at the specified source address to reside at the least-significant byte position within the vector for a little-endian memory transaction, or causing said byte to be positioned at the most-significant byte position within the vector for a big-endian memory transaction. In a variation on this embodiment, the processor is also configured to execute a store-swapped instruction directed to a destination address by storing a vector into a naturally-aligned memory region encompassing the destination address, and in doing so rotating the bytes of the vector to cause the least significant byte of the vector to be stored to at the specified destination address on a little-endian processor, or causing the most significant byte of the vector to be stored to the destination address said on a big-endian processor, or causing the specified byte to be stored to the destination address in the case of an endian-specific store-swapped variant.
43 Citations
18 Claims
-
1. A method for executing a load-swapped instruction, comprising:
-
receiving the load-swapped instruction to be executed, wherein the load-swapped instruction specifies a source address in memory, which is arbitrarily aligned; and executing the load-swapped instruction, which involves loading a vector from a naturally-aligned memory region encompassing the source address into a register, and in doing so, if the source address is unaligned, rotating the bytes of the vector by swapping a set of bytes residing at addresses lower than the source address with a set of bytes residing at addresses greater than or equal to the source address; wherein rotating the bytes of the vector involves rotating the bytes N positions, where N is equivalent to either the source address specified by the instruction modulo the vector length in bytes or the source address specified by the instruction modulo the vector length in bytes subtracted from the vector length in bytes; wherein rotating the bytes of the vector occurs before the vector reaches the register; and wherein rotating the bytes of the vector involves using an alignment circuit which is located along a load-store path between the memory and the register to cause the byte at the specified source address to reside at the least-significant byte position within the vector for a little-endian memory transaction, or causing said byte to be positioned at the most-significant byte position within the vector for a big-endian memory transaction. - View Dependent Claims (2, 3)
-
-
4. A method for executing a store-swapped instruction, comprising:
-
receiving the store-swapped instruction to be executed, wherein the store-swapped instruction specifies a destination address in memory, which is arbitrarily aligned; and executing the store-swapped instruction, which involves storing a vector from a register into a naturally-aligned memory region encompassing the destination address, and in doing so, if the destination address is unaligned, rotating the bytes of the vector by swapping a set of bytes residing at addresses lower than the destination address with a set of bytes residing at addresses greater than or equal to the destination address; wherein rotating the bytes of the vector involves rotating the bytes N positions, where N is equivalent to either the destination address specified by the instruction modulo the vector length in bytes or the destination address specified by the instruction modulo the vector length in bytes subtracted from the vector length in bytes; wherein rotating the bytes of the vector occurs after the vector moves out of the register and before the vector is stored in the memory; and wherein rotating the bytes of the vector involves using an alignment circuit which is located along a load-store path between the memory and the register to cause the least significant byte of the vector to be stored to at the specified destination address on a little-endian processor, or causing the most significant byte of the vector to be stored to the destination address said on a big-endian processor, or causing the specified byte to be stored to the destination address in the case of an endian-specific store-swapped variant. - View Dependent Claims (5, 6, 7, 8)
-
-
9. A method for executing a load-swapped-control-vector instruction, comprising:
-
receiving a load-swapped-control-vector instruction to be executed, wherein the load-swapped-control-vector instruction specifies a target address in memory, which is arbitrarily aligned; and executing the load-swapped-control-vector instruction to construct a control vector comprising predicate elements, wherein executing the load-swapped-control-vector instruction involves determining a value N, wherein N is the specified target address modulo the vector length in bytes, wherein the predicate elements comprise a true polarity and a false polarity, and wherein the control vector is constructed based on N and an endian-ness of a memory transaction; wherein for a big-endian memory transaction the N most-significant elements in the control vector are set to the true polarity and the remaining elements of the vector are set to the false polarity; wherein for a little-endian memory transaction the N least-significant elements in the control vector are set to the true polarity and the remaining elements of the vector are set to the false polarity; and wherein the control vector is used by a vector select instruction to determine which individual bytes from multiple vectors are selected to merge into a single output vector.
-
-
10. A computer system configured to execute a load-swapped instruction, comprising:
-
a processor; a memory; an instruction fetch unit within the processor configured to fetch the load-swapped instruction to be executed, wherein the load-swapped instruction specifies a source address in memory, which is arbitrarily aligned; and an execution unit within the processor configured to execute the load-swapped instruction by loading a vector from a naturally-aligned memory region encompassing the source address into a register, and in doing so, if the source address is unaligned, rotating the bytes of the vector by swapping a set of bytes residing at addresses rower than the source address with a set of bytes residing at addresses greater than or equal to the source address; wherein rotating the bytes of the vector involves rotating the bytes N positions, where N is equivalent to either the source address specified by the instruction modulo the vector length in bytes or the source address specified by the instruction modulo the vector length in bytes subtracted from the vector length in bytes; wherein rotating the bytes of the vector occurs before the vector reaches the register; and wherein rotating the bytes of the vector involves using an alignment circuit which is located along a load-store path between the memory and the register to cause the byte at the specified source address to reside at the least-significant byte position within the vector for a little-endian memory transaction, or causing said byte to be positioned at the most-significant byte position within the vector for a big-endian memory transaction. - View Dependent Claims (11, 12)
-
-
13. A computer system configured to execute a store-swapped instruction, comprising:
-
a processor; a memory; an instruction fetch unit within the processor configured to fetch the store-swapped instruction to be executed, wherein the store-swapped instruction specifies a destination address in memory, which is arbitrarily aligned; and an execution unit within the processor configured to execute the store-swapped instruction by storing a vector from a register into a naturally-aligned memory region encompassing the destination address, and in doing so, if the source address is unaligned, rotating the bytes of the vector by swapping a set of bytes residing at addresses lower than the destination address with a set of bytes residing at addresses greater than or equal to the destination address; wherein rotating the bytes of the vector involves rotating the bytes N positions, where N is equivalent to either the source address specified by the the instruction modulo the vector length in bytes or the source address specified by the instruction modulo the vector length in bytes subtracted from the vector length in bytes; wherein rotating the bytes of the vector occurs after the vector moves out of the register and before the vector is stored in the memory; and wherein rotating the bytes of the vector involves using an alignment circuit which is located along a load-store path between the memory and the register to cause the least significant byte of the vector to be stored to at the specified destination address on a little-endian processor, or causing the most significant byte of the vector to be stored to the destination address said on a big-endian processor, or causing the specified byte to be stored to the destination address in the case of an endian-specific store-swapped variant. - View Dependent Claims (14, 15, 16, 17)
-
-
18. A computer system configured to execute a load-swapped-control-vector instruction, comprising:
-
a processor; a memory; an instruction fetch unit within the processor configured to fetch the load-swapped-control-vector instruction to be executed, wherein the load-swapped-control-vector instruction specifies a target address in memory, which is arbitrarily aligned; and an execution unit within the processor configured to execute the load-swapped-control-vector instruction to construct a control vector comprising predicate elements, wherein executing the load-swapped-control-vector instruction, involves determining a value N, wherein N is the specified target address modulo the vector length in bytes, wherein the predicate elements comprise a true polarity and a false polarity, and wherein the control vector is constructed based on N and an endian-ness of a memory transaction; wherein for a big-endian memory transaction the N most-significant elements in the control vector are set to the true polarity and the remaining elements of the vector are set to the false polarity; wherein for a little-endian memory transaction the N least-significant elements in the control vector are set to the true polarity and the remaining elements of the vector are set to the false polarity; and wherein the control vector is used by a vector select instruction to determine which individual bytes from multiple vectors are selected to merge into a single output vector.
-
Specification