Instructions for efficiently accessing unaligned vectors

US 7,620,797 B2
Filed: 11/01/2006
Issued: 11/17/2009
Est. Priority Date: 11/01/2006
Status: Active Grant

First Claim

Patent Images

1. A method for executing a load-swapped instruction, comprising:

receiving the load-swapped instruction to be executed, wherein the load-swapped instruction specifies a source address in memory, which is arbitrarily aligned; and

executing the load-swapped instruction, which involves loading a vector from a naturally-aligned memory region encompassing the source address into a register, and in doing so, if the source address is unaligned, rotating the bytes of the vector by swapping a set of bytes residing at addresses lower than the source address with a set of bytes residing at addresses greater than or equal to the source address;

wherein rotating the bytes of the vector involves rotating the bytes N positions, where N is equivalent to either the source address specified by the instruction modulo the vector length in bytes or the source address specified by the instruction modulo the vector length in bytes subtracted from the vector length in bytes;

wherein rotating the bytes of the vector occurs before the vector reaches the register; and

wherein rotating the bytes of the vector involves using an alignment circuit which is located along a load-store path between the memory and the register to cause the byte at the specified source address to reside at the least-significant byte position within the vector for a little-endian memory transaction, or causing said byte to be positioned at the most-significant byte position within the vector for a big-endian memory transaction.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

One embodiment of the present invention provides a processor which is configured to execute load-swapped instructions, which are possibly directed to unaligned source address. The processor is configured to execute the load-swapped instruction by loading a vector from a naturally-aligned memory region encompassing the source address, and in doing so rotating the bytes of the vector to cause the byte at the specified source address to reside at the least-significant byte position within the vector for a little-endian memory transaction, or causing said byte to be positioned at the most-significant byte position within the vector for a big-endian memory transaction. In a variation on this embodiment, the processor is also configured to execute a store-swapped instruction directed to a destination address by storing a vector into a naturally-aligned memory region encompassing the destination address, and in doing so rotating the bytes of the vector to cause the least significant byte of the vector to be stored to at the specified destination address on a little-endian processor, or causing the most significant byte of the vector to be stored to the destination address said on a big-endian processor, or causing the specified byte to be stored to the destination address in the case of an endian-specific store-swapped variant.

43 Citations

View as Search Results

18 Claims

1. A method for executing a load-swapped instruction, comprising:
- receiving the load-swapped instruction to be executed, wherein the load-swapped instruction specifies a source address in memory, which is arbitrarily aligned; and
  
  executing the load-swapped instruction, which involves loading a vector from a naturally-aligned memory region encompassing the source address into a register, and in doing so, if the source address is unaligned, rotating the bytes of the vector by swapping a set of bytes residing at addresses lower than the source address with a set of bytes residing at addresses greater than or equal to the source address;
  
  wherein rotating the bytes of the vector involves rotating the bytes N positions, where N is equivalent to either the source address specified by the instruction modulo the vector length in bytes or the source address specified by the instruction modulo the vector length in bytes subtracted from the vector length in bytes;
  
  wherein rotating the bytes of the vector occurs before the vector reaches the register; and
  
  wherein rotating the bytes of the vector involves using an alignment circuit which is located along a load-store path between the memory and the register to cause the byte at the specified source address to reside at the least-significant byte position within the vector for a little-endian memory transaction, or causing said byte to be positioned at the most-significant byte position within the vector for a big-endian memory transaction.
- View Dependent Claims (2, 3)
- - 2. The method of claim 1, wherein in response to N being equivalent to the source address specified by the instruction modulo the vector length in bytes, the load-swapped instruction rotates bytes in the direction determined by the endian-ness of the memory transaction, wherein the vector is rotated to the left N byte positions for a big-endian memory transaction, or the vector is rotated to the right N byte positions in the case of little-endian memory transactions.
  - 3. The method claim 1, wherein in response to N being equivalent to the source address specified by the instruction modulo the vector length in bytes subtracted from the vector length in bytes, the load-swapped instruction rotates bytes in the direction determined by the endian-ness of the memory transaction, and wherein the vector is rotated to the right N byte positions for a big-endian memory transaction, or the vector is rotated to the left N byte positions in the case of little-endian memory transactions.

4. A method for executing a store-swapped instruction, comprising:
- receiving the store-swapped instruction to be executed, wherein the store-swapped instruction specifies a destination address in memory, which is arbitrarily aligned; and
  
  executing the store-swapped instruction, which involves storing a vector from a register into a naturally-aligned memory region encompassing the destination address, and in doing so, if the destination address is unaligned, rotating the bytes of the vector by swapping a set of bytes residing at addresses lower than the destination address with a set of bytes residing at addresses greater than or equal to the destination address;
  
  wherein rotating the bytes of the vector involves rotating the bytes N positions, where N is equivalent to either the destination address specified by the instruction modulo the vector length in bytes or the destination address specified by the instruction modulo the vector length in bytes subtracted from the vector length in bytes;
  
  wherein rotating the bytes of the vector occurs after the vector moves out of the register and before the vector is stored in the memory; and
  
  wherein rotating the bytes of the vector involves using an alignment circuit which is located along a load-store path between the memory and the register to cause the least significant byte of the vector to be stored to at the specified destination address on a little-endian processor, or causing the most significant byte of the vector to be stored to the destination address said on a big-endian processor, or causing the specified byte to be stored to the destination address in the case of an endian-specific store-swapped variant.
- View Dependent Claims (5, 6, 7, 8)
- - 5. The method of claim 4, wherein in response to N being equivalent to the destination address specified by the instruction modulo the vector length in bytes, the store-swapped instruction rotates bytes in the direction determined by the endian-ness of the memory transaction, wherein the vector is rotated to the right N byte positions for a big-endian memory transaction, or the vector is rotated to the left N byte positions in the case of little-endian memory transactions.
  - 6. The method of claim 4, wherein in response to N being equivalent to the destination address specified by the instruction modulo the vector length in bytes subtracted from the vector length in bytes, the store-swapped instruction rotates bytes in the direction determined by the endian-ness of the memory transaction, and wherein the vector is rotated to the left N byte positions for a big-endian memory transaction, or the vector is rotated to the right N byte positions in the case of little-endian memory transactions.
  - 7. The method of claim 4, where if the store-swapped instruction is a store-swapped-leading instruction, storing the vector to the destination address involves:
    - storing a whole vector to the destination address if the destination address is naturally aligned; and
      
      storing a partial vector to the destination address if the destination address is unaligned.
  - 8. The method of claim 6, where if the store-swapped instruction is a store-swapped-trailing instruction, storing the vector to the destination address involves:
    - storing nothing to the destination address if the destination address is aligned with memory;
      
      orstoring a partial vector to the modified destination address if the destination address is unaligned.

9. A method for executing a load-swapped-control-vector instruction, comprising:
- receiving a load-swapped-control-vector instruction to be executed, wherein the load-swapped-control-vector instruction specifies a target address in memory, which is arbitrarily aligned; and
  
  executing the load-swapped-control-vector instruction to construct a control vector comprising predicate elements, wherein executing the load-swapped-control-vector instruction involves determining a value N, wherein N is the specified target address modulo the vector length in bytes, wherein the predicate elements comprise a true polarity and a false polarity, and wherein the control vector is constructed based on N and an endian-ness of a memory transaction;
  
  wherein for a big-endian memory transaction the N most-significant elements in the control vector are set to the true polarity and the remaining elements of the vector are set to the false polarity;
  
  wherein for a little-endian memory transaction the N least-significant elements in the control vector are set to the true polarity and the remaining elements of the vector are set to the false polarity; and
  
  wherein the control vector is used by a vector select instruction to determine which individual bytes from multiple vectors are selected to merge into a single output vector.

10. A computer system configured to execute a load-swapped instruction, comprising:
- a processor;
  
  a memory;
  
  an instruction fetch unit within the processor configured to fetch the load-swapped instruction to be executed, wherein the load-swapped instruction specifies a source address in memory, which is arbitrarily aligned; and
  
  an execution unit within the processor configured to execute the load-swapped instruction by loading a vector from a naturally-aligned memory region encompassing the source address into a register, and in doing so, if the source address is unaligned, rotating the bytes of the vector by swapping a set of bytes residing at addresses rower than the source address with a set of bytes residing at addresses greater than or equal to the source address;
  
  wherein rotating the bytes of the vector involves rotating the bytes N positions, where N is equivalent to either the source address specified by the instruction modulo the vector length in bytes or the source address specified by the instruction modulo the vector length in bytes subtracted from the vector length in bytes;
  
  wherein rotating the bytes of the vector occurs before the vector reaches the register; and
  
  wherein rotating the bytes of the vector involves using an alignment circuit which is located along a load-store path between the memory and the register to cause the byte at the specified source address to reside at the least-significant byte position within the vector for a little-endian memory transaction, or causing said byte to be positioned at the most-significant byte position within the vector for a big-endian memory transaction.
- View Dependent Claims (11, 12)
- - 11. The computer system of claim 10, wherein in response to N being equivalent to the source address specified by the instruction modulo the vector length in bytes, the load-swapped instruction rotates bytes in the direction determined by the endian-ness of the memory transaction, wherein the vector is rotated to the left N byte positions for a big-endian memory transaction, or the vector is rotated to the right N byte positions in the case of little-endian memory transactions.
  - 12. The computer system of claim 10, wherein in response to N being equivalent to the destination address specified by the instruction modulo the vector length in bytes subtracted from the vector length in bytes, the load-swapped instruction rotates bytes in the direction determined by the endian-ness of the memory transaction, and wherein the vector is rotated to the right N byte positions for a big-endian memory transaction, or the vector is rotated to the left N byte positions in the case of little-endian memory transactions.

13. A computer system configured to execute a store-swapped instruction, comprising:
- a processor;
  
  a memory;
  
  an instruction fetch unit within the processor configured to fetch the store-swapped instruction to be executed, wherein the store-swapped instruction specifies a destination address in memory, which is arbitrarily aligned; and
  
  an execution unit within the processor configured to execute the store-swapped instruction by storing a vector from a register into a naturally-aligned memory region encompassing the destination address, and in doing so, if the source address is unaligned, rotating the bytes of the vector by swapping a set of bytes residing at addresses lower than the destination address with a set of bytes residing at addresses greater than or equal to the destination address;
  
  wherein rotating the bytes of the vector involves rotating the bytes N positions, where N is equivalent to either the source address specified by the the instruction modulo the vector length in bytes or the source address specified by the instruction modulo the vector length in bytes subtracted from the vector length in bytes;
  
  wherein rotating the bytes of the vector occurs after the vector moves out of the register and before the vector is stored in the memory; and
  
  wherein rotating the bytes of the vector involves using an alignment circuit which is located along a load-store path between the memory and the register to cause the least significant byte of the vector to be stored to at the specified destination address on a little-endian processor, or causing the most significant byte of the vector to be stored to the destination address said on a big-endian processor, or causing the specified byte to be stored to the destination address in the case of an endian-specific store-swapped variant.
- View Dependent Claims (14, 15, 16, 17)
- - 14. The computer system of claim 13, wherein in response to N being equivalent to the destination address specified by the instruction modulo the vector length in bytes, the store-swapped instruction rotates bytes in the direction determined by the endian-ness of the memory transaction, wherein the vector is rotated to the right N byte positions for a big-endian memory transaction, or the vector is rotated to the left N byte positions in the case of little-endian memory transactions.
  - 15. The computer system of claim 13, wherein in response to N being equivalent to the destination address specified by the instruction modulo the vector length in bytes subtracted from the vector length in bytes, the store-swapped instruction rotates bytes in the direction determined by the endian-ness of the memory transaction, and wherein the vector is rotated to the left N byte positions for a big-endian memory transaction, or the vector is rotated to the right N byte positions in the case of little-endian memory transactions.
  - 16. The computer system of claim 13, wherein if the store-swapped instruction is a store-swapped-leading instruction, storing the vector to the destination address involves:
    - storing a whole vector to the destination address if the destination address is naturally aligned; and
      
      storing a partial vector to the destination address if the destination address is unaligned.
  - 17. The computer system of claim 13, wherein if the store-swapped instruction is a store-swapped-trailing instruction, storing the vector to the destination address involves:
    - storing nothing to the destination address if the destination address is aligned with memory;
      
      orstoring a partial vector to the modified destination address if the destination address is unaligned.

18. A computer system configured to execute a load-swapped-control-vector instruction, comprising:
- a processor;
  
  a memory;
  
  an instruction fetch unit within the processor configured to fetch the load-swapped-control-vector instruction to be executed, wherein the load-swapped-control-vector instruction specifies a target address in memory, which is arbitrarily aligned; and
  
  an execution unit within the processor configured to execute the load-swapped-control-vector instruction to construct a control vector comprising predicate elements, wherein executing the load-swapped-control-vector instruction, involves determining a value N, wherein N is the specified target address modulo the vector length in bytes, wherein the predicate elements comprise a true polarity and a false polarity, and wherein the control vector is constructed based on N and an endian-ness of a memory transaction;
  
  wherein for a big-endian memory transaction the N most-significant elements in the control vector are set to the true polarity and the remaining elements of the vector are set to the false polarity;
  
  wherein for a little-endian memory transaction the N least-significant elements in the control vector are set to the true polarity and the remaining elements of the vector are set to the false polarity; and
  
  wherein the control vector is used by a vector select instruction to determine which individual bytes from multiple vectors are selected to merge into a single output vector.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Inc.
Original Assignee
Apple Inc.
Inventors
Diefendorff, Keith E., Gonion, Jeffry E.
Primary Examiner(s)
Chan; Eddie P
Assistant Examiner(s)
Alrobaye; Idriss N

Application Number

US11/591,804
Publication Number

US 20080114968A1
Time in Patent Office

1,112 Days
Field of Search

712/4, 712/220, 712/204, 712/225, 712/226, 712/300, 712/5, 712/223
US Class Current

712/204
CPC Class Codes

G06F 9/30032   Movement instructions, e.g....

G06F 9/30036   Instructions to perform ope...

G06F 9/30043   LOAD or STORE instructions;...

Instructions for efficiently accessing unaligned vectors

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

43 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Instructions for efficiently accessing unaligned vectors

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

43 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links