Method and apparatus for indirectly addressed vector load-add -store across multi-processors
First Claim
1. A computerized method comprising, in each of a plurality of processors including a first processor and a second processor:
- (a) loading a first vector register with addressing values;
(b) loading a second vector register with operand values;
(c) determining which, if any, element addresses of the first vector register have a value that duplicates a value in another element address;
(d) selectively adding certain elements of the second vector of operand values based on the element addresses of the duplicated values;
(e) loading, using indirect addressing from the first vector register, elements from memory into a third vector register;
(f) adding values from the third vector register and the second vector of operand values to generate a result vector; and
(g) storing the result vector to memory using indirect addressing;
wherein the determining of duplicates includes;
generating each respective address value for a sequence of addressed locations within a constrained area of memory containing 2N consecutive addresses using an N-bit value derived from each respective addressing value of the first vector register,generating each respective data value of a first sequence of values by combining at least a portion of each respective addressing value of the first vector register to a respective one of a sequence of integer numbers,storing the first sequence of values to the constrained memory area using the generated sequence of respective address values,loading a second first sequence of values from the constrained memory area using the generated sequence of respective address values, andcomparing the first sequence of values to the second sequence of values;
wherein the loading of the third vector register includes loading elements from locations specified by addressing values corresponding to indications of positive compares from the comparing;
wherein addresses of the elements from memory are calculated by adding each respective addressing value to a base address;
wherein the adding includes a floating-point addition operation that produces at least one element of the result vector as an ordered-operation floating point summation of an element of the loaded third vector register and a plurality of respective elements of the original second vector of operand values corresponding to elements of the first vector of addressing values having identical values, andwherein for the storing of the result vector of elements to memory, elements are stored to locations specified by addressing values corresponding to indications of positive compares.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus to correctly compute a vector-gather, vector-operate (e.g., vector add), and vector-scatter sequence, particularly when elements of the vector may be redundantly presented, as with indirectly addressed vector operations. For an add operation, one vector register is loaded with the “add-in” values, and another vector register is loaded with address values of “add to” elements to be gathered from memory into a third vector register. If the vector of address values has a plurality of elements that point to the same memory address, the algorithm should add all the “add in” values from elements corresponding to the elements having the duplicated addresses. An indirectly addressed load performs the “gather” operation to load the “add to” values. A vector add operation then adds corresponding elements from the “add in” vector to the “add to” vector. An indirectly addressed store then performs the “scatter” operation to store the results.
-
Citations
23 Claims
-
1. A computerized method comprising, in each of a plurality of processors including a first processor and a second processor:
-
(a) loading a first vector register with addressing values; (b) loading a second vector register with operand values; (c) determining which, if any, element addresses of the first vector register have a value that duplicates a value in another element address; (d) selectively adding certain elements of the second vector of operand values based on the element addresses of the duplicated values; (e) loading, using indirect addressing from the first vector register, elements from memory into a third vector register; (f) adding values from the third vector register and the second vector of operand values to generate a result vector; and (g) storing the result vector to memory using indirect addressing; wherein the determining of duplicates includes; generating each respective address value for a sequence of addressed locations within a constrained area of memory containing 2N consecutive addresses using an N-bit value derived from each respective addressing value of the first vector register, generating each respective data value of a first sequence of values by combining at least a portion of each respective addressing value of the first vector register to a respective one of a sequence of integer numbers, storing the first sequence of values to the constrained memory area using the generated sequence of respective address values, loading a second first sequence of values from the constrained memory area using the generated sequence of respective address values, and comparing the first sequence of values to the second sequence of values; wherein the loading of the third vector register includes loading elements from locations specified by addressing values corresponding to indications of positive compares from the comparing; wherein addresses of the elements from memory are calculated by adding each respective addressing value to a base address; wherein the adding includes a floating-point addition operation that produces at least one element of the result vector as an ordered-operation floating point summation of an element of the loaded third vector register and a plurality of respective elements of the original second vector of operand values corresponding to elements of the first vector of addressing values having identical values, and wherein for the storing of the result vector of elements to memory, elements are stored to locations specified by addressing values corresponding to indications of positive compares. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computerized method comprising:
-
(a) within a first vector processor; loading a first vector register in the first vector processor with addressing values; loading a second vector register in the first vector processor with operand values; determining which, if any, element addresses of the first vector register in the first vector processor have a value that duplicates a value in another element address; selectively adding certain elements of the second vector of operand values in the first vector processor based on the element addresses of the duplicated values; (b) within a second vector processor; loading a first vector register in the second vector processor with addressing values; loading a second vector register in the second vector processor with operand values; determining which, if any, element addresses of the first vector register in the second vector processor have a value that duplicates a value in another element address; selectively operating on certain elements of the second vector of operand values in the second vector processor based on the element addresses the duplicated values; (c) performing a synchronization operation that ensures that prior store operations effectively complete in at least the second vector processors before the following (d) operations; (d) within the first vector processor; loading, using indirect addressing from the first vector register, elements from memory into a third vector register in the first vector processor; operating on values from the third vector register and the second vector of operand values in the first vector processor to generate a first result vector; and storing the first result vector to memory using indirect addressing; (e) performing a synchronization operation that ensures that the storing of the first result vector effectively completes before the following (f) operations; and (f) within the second vector processor; loading, using indirect addressing from the first vector register, elements from memory into a third vector register in the second vector processor; operating on values from the third vector register and the second vector of operand values in the second vector processor to generate a second result vector; and storing the second result vector to memory using indirect addressing; wherein the determining of duplicates includes; generating each respective address value for a sequence of addressed locations within a constrained area of memory containing 2N consecutive addresses using an N-bit value derived from each respective addressing value of the first vector register, generating each respective data value of a first sequence of values by combining at least a portion of each respective addressing value of the first vector register to a respective one of a sequence of integer numbers, storing the first sequence of values to the constrained memory area using the generated sequence of respective address values, loading a second first sequence of values from the constrained memory area using the generated sequence of respective address values, and comparing the first sequence of values to the second sequence of values. - View Dependent Claims (8, 9, 10, 11)
-
-
12. A system comprising:
-
a first vector register having addressing values; a second vector register having operand values; circuitry programmed to determine which, if any, element addresses of the first vector register have a value that duplicates a value in another element address; circuitry programmed to selectively add certain elements of the second vector of operand values based on the element addresses of the duplicated values; circuitry programmed to load, using indirect addressing from the first vector register, elements from memory into a third vector register; circuitry programmed to add values from the third vector register and the second vector of operand values to generate a result vector; and circuitry programmed to store the result vector to memory using indirect addressing; wherein the circuitry programmed to determine duplicates includes; circuitry programmed to generate each respective address value for a sequence of addressed locations within a constrained area of memory containing 2N consecutive addresses using an N-bit value derived from each respective addressing value of the first vector register, circuitry programmed to generate each respective data value of a first sequence of values by combining at least a portion of each respective addressing value of the first vector register to a respective one of a sequence of integer numbers, circuitry programmed to store the first sequence of values to the constrained memory area using the generated sequence of respective address values, circuitry programmed to load a second sequence of values from the constrained memory area using the generated sequence of respective address values, and circuitry programmed to compare the first sequence of values to the second sequence of values; wherein the circuitry programmed to load the third vector register loads elements from locations specified by addressing values corresponding to indications of positive compares; wherein addresses of the elements from memory are calculated by adding each respective addressing value to a base address; and wherein the circuitry programmed to add includes a floating-point adder that produces at least one element of the result vector as an ordered-operation floating point summation of an element of the loaded third vector register and a plurality of respective elements of the original second vector of operand values corresponding to elements of the first vector of addressing values having identical values. - View Dependent Claims (13, 14, 15, 16, 17)
-
-
18. A system comprising:
-
(a) a first vector processor that includes; means for loading a first vector register in the first vector processor with addressing values; means for loading a second vector register in the first vector processor with operand values; means for determining which, if any, element addresses of the first vector register in the first vector processor have a value that duplicates a value in another element address; means for selectively adding certain elements of the second vector of operand values in the first vector processor based on the element addresses of the duplicated values; (b) a second vector processor that includes; means for loading a first vector register in the second vector processor with addressing values; means for loading a second vector register in the second vector processor with operand values; means for determining which, if any, element addresses of the first vector register in the second vector processor have a value that duplicates a value in another element address; means for selectively operating on certain elements of the second vector of operand values in the second vector processor based on the element addresses the duplicated values; (c) means for performing a synchronization operation that ensures that prior store operations effectively complete in at least the second vector processors before the operations of the following (d) means; (d) within the first vector processor; means for loading, using indirect addressing from the first vector register, elements from memory into a third vector register in the first vector processor; means for operating on values from the third vector register and the second vector of operand values in the first vector processor to generate a first result vector; and means for storing the first result vector to memory using indirect addressing; (e) performing a synchronization operation that ensures that the storing of the first result vector effectively completes before the operations of the following (f) means; and (f) within the second vector processor; means for loading, using indirect addressing from the first vector register, elements from memory into a third vector register in the second vector processor; means for operating on values from the third vector register and the second vector of operand values in the second vector processor to generate a second result vector; and means for storing the second result vector to memory using indirect addressing; wherein the means for determining duplicates further includes; means for generating each respective address value for a sequence of addressed locations within a constrained area of memory containing 2N consecutive addresses using an N-bit value derived from each respective addressing value of the first vector register, means for generating each respective data value of a first sequence of values by combining at least a portion of each respective addressing value of the first vector register to a respective one of a sequence of integer numbers, means for storing the first sequence of values to the constrained memory area using the generated sequence of respective address values, means for loading a second first sequence of values from the constrained memory area using the generated sequence of respective address values, and means for comparing the first sequence of values to the second sequence of values. - View Dependent Claims (19, 20, 21, 22)
-
-
23. A computer-readable medium having instructions stored thereon for causing a suitably programmed information-processing system to execute a method comprising:
-
loading a first vector register with addressing values; loading a second vector register with operand values; determining which, if any, element addresses of the first vector register have a value that duplicates a value in another element address; selectively adding certain elements of the second vector of operand values based on the element addresses of the duplicated values; loading, using indirect addressing from the first vector register, elements from memory into a third vector register; adding values from the third vector register and the second vector of operand values to generate a result vector; and storing the result vector to memory using indirect addressing; wherein the determining of duplicates includes; generating each respective address value for a sequence of addressed locations within a constrained area of memory containing 2N consecutive addresses using an N-bit value derived from each respective addressing value of the first vector register, generating each respective data value of a first sequence of values by combining at least a portion of each respective addressing value of the first vector register to a respective one of a sequence of integer numbers, storing the first sequence of values to the constrained memory area using the generated sequence of respective address values, loading a second first sequence of values from the constrained memory area using the generated sequence of respective address values, and comparing the first sequence of values to the second sequence of values; wherein the loading of the third vector register includes loading elements from locations specified by addressing values corresponding to indications of positive compares from the comparing; wherein addresses of the elements from memory are calculated by adding each respective addressing value to a base address; wherein the adding includes a floating-point addition operation that produces at least one element of the result vector as an ordered-operation floating point summation of an element of the loaded third vector register and a plurality of respective elements of the original second vector of operand values corresponding to elements of the first vector of addressing values having identical values, and wherein for the storing of the result vector of elements to memory, elements are stored to locations specified by addressing values corresponding to indications of positive compares.
-
Specification