Alignment and ordering of vector elements for single instruction multiple data processing
First Claim
1. In a computer system including a processor having a plurality of registers, a method for generating an aligned vector of first width from two second width vectors for single instruction multiple data (SIMD) processing, comprising the steps of:
- loading a first vector from a memory unit into a first register from a memory unit into a first register, wherein the first vector contains a first byte of an aligned vector to be generated;
loading a second vector from the memory unit into a second register;
determining a starting byte in the first register wherein the starting byte specifies the first byte of an aligned vector and wherein the starting byte is specified as a constant in an alignment instruction;
extracting a first width vector from the first register and the second register beginning from the first bit in the first byte of the first register continuing through the bits in the second register; and
replicating the extracted first width vector into a third register such that the third register contains a plurality of elements aligned for SIMD processing.
7 Assignments
0 Petitions
Accused Products
Abstract
The present invention provides alignment and ordering of vector elements for SIMD processing. In the alignment of vector elements for SIMD processing, one vector is loaded from a memory unit into a first register and another vector is loaded from the memory unit into a second register. The first vector contains a first byte of an aligned vector to be generated. Then, a starting byte specifying the first byte of an aligned vector is determined. Next, a vector is extracted from the first register and the second register beginning from the first bit in the first byte of the first register continuing through the bits in the second register. Finally, the extracted vector is replicated into a third register such that the third register contains a plurality of elements aligned for SIMD processing. In the ordering of vector elements for SIMD processing, a first vector is loaded from a memory unit into a first register and a second vector is loaded from the memory unit into a second register. Then, a subset of elements are selected from the first register and the second register. The elements from the subset are then replicated into the elements in the third register in a particular order suitable for subsequent SIMD vector processing.
322 Citations
44 Claims
-
1. In a computer system including a processor having a plurality of registers, a method for generating an aligned vector of first width from two second width vectors for single instruction multiple data (SIMD) processing, comprising the steps of:
-
loading a first vector from a memory unit into a first register from a memory unit into a first register, wherein the first vector contains a first byte of an aligned vector to be generated; loading a second vector from the memory unit into a second register; determining a starting byte in the first register wherein the starting byte specifies the first byte of an aligned vector and wherein the starting byte is specified as a constant in an alignment instruction; extracting a first width vector from the first register and the second register beginning from the first bit in the first byte of the first register continuing through the bits in the second register; and replicating the extracted first width vector into a third register such that the third register contains a plurality of elements aligned for SIMD processing. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. In a computer system including a processor having a plurality of registers, a method for generating an ordered set of elements in an N-bit vector from two sets of elements in two N-bit vectors for single instruction multiple data (SIMD) vector processing, said method comprising the steps of:
-
loading a first vector from a memory unit into a first register; loading a second vector from the memory unit into a second register;
wherein the first vector and the second vector are each comprised of 4 16-bit elements indexed from 0 to 3;selecting a subset of elements from the first register and the second register wherein the subset is comprised of the elements 2 and 3 from the first register and the elements 2 and 3 from the second register; and replicating the elements from the subset into the elements in the third register in a particular order suitable for subsequent SIMD vector processing, wherein the particular order of the elements in the third register comprises; the element 0 replicated from the element 2 of the second register; the element 1 replicated from the element 2 of the first register; the element 2 replicated from the element 3 of the second register; and the element 3 replicated from the element 3 of the first register. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44)
-
Specification