Vector math instruction execution by DSP processor approximating division and complex number magnitude
First Claim
1. A method for performing an approximate division on a digital signal processor having a plurality of registers, each register storing data of N bits, and a plurality of work units, each work unit performing data processing operations under instruction control, the steps comprising:
- storing a first numerator operand of N/2 bits in a set of N/2 most significant bits of a first register of the plurality of registers;
storing a second numerator operand of N/2 bits in a set of N/2 least significant bits of the first register;
storing a first denominator operand of N/2 bits in a set of N/2 most significant bits of a second register of the plurality of registers;
storing a second denominator operand of N/2 bits in a set of N/2 least significant bits of the second register;
employing one of the work units to separatelyform a first absolute value of the most significant bits of the second register and store the first absolute value in the most significant bits of a third register of the plurality of registers, andform a second absolute value of the least significant bits of the second register and store the second absolute value in the least significant bits of the third register;
employing one of the work units to extract the first absolute value from the third register;
employing one of the work units to determine a number of unused bits in the first absolute value;
employing one of the work units to generate a headroom hn+1 of the first denominator operand by extracting a predetermined number of bits of the number of unused bits in the first absolute value;
employing one of the work units to compute a first shift factor sn+1 for the first denominator operand by adding the first headroom hn+1 to a first constant and subtracting a number of fractional bits in the first denominator operand;
employing one of the work units to extract the second absolute value from the third register;
employing one of the work units to determine a number of unused bits in the second absolute value;
employing one of the work units to generate a second headroom hn of the second denominator operand by extracting a predetermined number of bits of the number of unused bits in the second absolute value;
employing one of the work units to compute a second shift factor sn for the second denominator operand by adding the second headroom hn to the first constant and subtracting a number of fractional bits in the second denominator operand;
employing one to the work units to generate a first intermediate result by left shifting the data in the third register by an amount of the first headroom hn+1 bits;
employing one of the work units to generate a first division LUT index in+1 by right shifting the first intermediate by a second constant;
employing one to the work units to generating a second intermediate result by left shifting the data in the third register by an amount of the second headroom hn bits;
employing one of the work units to generating a second division LUT index in by right shifting the first intermediate by a third constant;
employing one of the work units to generate a third intermediate result by performing an exclusive OR of data in the first register and data in the second register;
employing one of the work units to extract a most significant bit of the most significant bits of the third intermediate result;
employing one of the work units to generate a sign of a first division mn+1 by subtracting twice the most significant bit of the most significant bits of the third intermediate result from 1;
employing one of the work units to extract a most significant bit of the least significant bits of the third intermediate result;
employing one of the work units to generate a sign of a second division mn+1 by subtracting twice the most significant bit of the least significant bits of the third intermediate result from 1;
employing one of the work units to generate a first look-up table value by indexing a look-up table with the first division LUT index in+1;
employing one of the work units to generate a second look-up table value by indexing a look-up table with the second division LUT index in;
employing one of the work units to the store first look-up table value in most significant bits of a fourth register of the plurality of registers and store the second look-up table value in least significant bits of the fourth register;
employing one of the work units to generate a first product by multiplying the most significant bits of the fourth register by the most significant bits of the third register;
employing one of the work units to generate a first absolute division value by performing a saturated left shift of the first product by the headroom hn+1;
employing one of the work units to generate a first division value by multiplying the first absolute division value by the sign of a first division mn+1;
employing one of the work units to generate a second product by multiplying the least significant bits of the fourth register by the least significant bits of the third register;
employing one of the work units to generate a second absolute division value by performing a saturated left shift of the second product by the headroom hn;
employing one of the work units to generate a second division value by multiplying the second absolute division value by the sign of a second division mn; and
employing one of the work units to generate a packed division by storing the first division value in most significant bits of a fifth register of the plurality of registers and storing the second division value in least significant bits of the fifth register.
1 Assignment
0 Petitions
Accused Products
Abstract
A digital signal processor (DSP) includes an instruction fetch unit, an instruction decode unit, a register set and a plurality of work units in communication with the instruction decode unit. A first embodiment calculates two divisions on packed numerators and packed denominators. The DSP work units calculate indexes into a 1/d look-up table and make a final sign correction. A second embodiment calculates an approximation of a vector magnitude of a complex number x+jy. The approximation is based upon √(x2+y2)≈α*max(|x|, |y|)+β*min(|x|, |y|). The DSP work units calculate the absolute values, find the maxima and minima, and form the packed results of two vector magnitude calculations.
-
Citations
9 Claims
-
1. A method for performing an approximate division on a digital signal processor having a plurality of registers, each register storing data of N bits, and a plurality of work units, each work unit performing data processing operations under instruction control, the steps comprising:
-
storing a first numerator operand of N/2 bits in a set of N/2 most significant bits of a first register of the plurality of registers; storing a second numerator operand of N/2 bits in a set of N/2 least significant bits of the first register; storing a first denominator operand of N/2 bits in a set of N/2 most significant bits of a second register of the plurality of registers; storing a second denominator operand of N/2 bits in a set of N/2 least significant bits of the second register; employing one of the work units to separately form a first absolute value of the most significant bits of the second register and store the first absolute value in the most significant bits of a third register of the plurality of registers, and form a second absolute value of the least significant bits of the second register and store the second absolute value in the least significant bits of the third register; employing one of the work units to extract the first absolute value from the third register; employing one of the work units to determine a number of unused bits in the first absolute value; employing one of the work units to generate a headroom hn+1 of the first denominator operand by extracting a predetermined number of bits of the number of unused bits in the first absolute value; employing one of the work units to compute a first shift factor sn+1 for the first denominator operand by adding the first headroom hn+1 to a first constant and subtracting a number of fractional bits in the first denominator operand; employing one of the work units to extract the second absolute value from the third register; employing one of the work units to determine a number of unused bits in the second absolute value; employing one of the work units to generate a second headroom hn of the second denominator operand by extracting a predetermined number of bits of the number of unused bits in the second absolute value; employing one of the work units to compute a second shift factor sn for the second denominator operand by adding the second headroom hn to the first constant and subtracting a number of fractional bits in the second denominator operand; employing one to the work units to generate a first intermediate result by left shifting the data in the third register by an amount of the first headroom hn+1 bits; employing one of the work units to generate a first division LUT index in+1 by right shifting the first intermediate by a second constant; employing one to the work units to generating a second intermediate result by left shifting the data in the third register by an amount of the second headroom hn bits; employing one of the work units to generating a second division LUT index in by right shifting the first intermediate by a third constant; employing one of the work units to generate a third intermediate result by performing an exclusive OR of data in the first register and data in the second register; employing one of the work units to extract a most significant bit of the most significant bits of the third intermediate result; employing one of the work units to generate a sign of a first division mn+1 by subtracting twice the most significant bit of the most significant bits of the third intermediate result from 1; employing one of the work units to extract a most significant bit of the least significant bits of the third intermediate result; employing one of the work units to generate a sign of a second division mn+1 by subtracting twice the most significant bit of the least significant bits of the third intermediate result from 1; employing one of the work units to generate a first look-up table value by indexing a look-up table with the first division LUT index in+1; employing one of the work units to generate a second look-up table value by indexing a look-up table with the second division LUT index in; employing one of the work units to the store first look-up table value in most significant bits of a fourth register of the plurality of registers and store the second look-up table value in least significant bits of the fourth register; employing one of the work units to generate a first product by multiplying the most significant bits of the fourth register by the most significant bits of the third register; employing one of the work units to generate a first absolute division value by performing a saturated left shift of the first product by the headroom hn+1; employing one of the work units to generate a first division value by multiplying the first absolute division value by the sign of a first division mn+1; employing one of the work units to generate a second product by multiplying the least significant bits of the fourth register by the least significant bits of the third register; employing one of the work units to generate a second absolute division value by performing a saturated left shift of the second product by the headroom hn; employing one of the work units to generate a second division value by multiplying the second absolute division value by the sign of a second division mn; and employing one of the work units to generate a packed division by storing the first division value in most significant bits of a fifth register of the plurality of registers and storing the second division value in least significant bits of the fifth register. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method of calculating an approximation of a vector magnitude of complex numbers on a digital signal processor having a plurality of registers, each register storing data of N bits, and a plurality of work units, each work unit performing data processing operations under instruction control, the steps comprising:
-
storing a first imaginary part operand of N/2 bits in a set of N/2 most significant bits of a first register of the plurality of registers; storing a first real part operand of N/2 bits in a set of N/2 least significant bits of the first register; storing a second imaginary part operand of N/2 bits in a set of N/2 most significant bits of a second register of the plurality of registers; storing a second real part operand of N/2 bits in a set of N/2 least significant bits of the second register; employing one of the work units to separately generate an absolute value of the first imaginary part and an absolute value of the first real part, storing the absolute value of the first imaginary part in the N/2 most significant bits of a third register of the plurality of register and storing the absolute value of the first real part in the N/2 least significant bits of the third register; employing one of the work units to separately generate an absolute value of the second imaginary part and an absolute value of the second real part, storing the absolute value of the second imaginary part in the N/2 most significant bits of a fourth register of the plurality of register and storing the absolute value of the second real part in the N/2 least significant bits of the fourth register; employing one of the work units to store the absolute value of the first imaginary part in a N/2 most significant bits of a fifth register of a first register pair of the plurality of registers, store the absolute value of the second imaginary part in a N/2 least significant bits of the fifth register, store the absolute value of the first real part in a N/2 most significant bits of a sixth register of the first register pair of the plurality of register and store the absolute value of the second real part in a N/2 least significant bits of the sixth register; employing one of the work units to store a first maximum value of the absolute value the first imaginary value and the absolute value of the first real value in N/2 most significant bits of a seventh register and to store a second maximum value of the second imaginary value and the absolute value of the second real value in N/2 least significant bits of the seventh register; employing one of the work units to store a first minimum value of the absolute value the first imaginary value and the absolute value of the first real value in N/2 most significant bits of an eighth register and to store a second minimum value of the second imaginary value and the absolute value of the second real value in N/2 least significant bits of the eighth register; employing one of the work units to store the first maximum value in N/2 most significant bits of a ninth register of a second register pair, to store the first minimum value in N/2 least significant bits of the ninth register, to store the second maximum value in N/2 most significant bits of a tenth register of the second register pair and to store the second minimum value in N/2 least significant bits to the tenth register; storing a first constant in N/2 most significant bits of a ninth register; storing a second constant in N/2 least significant bits of the ninth register; employing one of the work units to add a first product of the first maximum value and the first constant to a second product of the first minimum value and the second constant and store a sum in an eleventh register; employing one of the work units to add a third product of the second maximum value and the second constant to a fourth product of the second minimum value and the second constant and store a sum in a twelfth register; employing one of the work units to store N/2 most significant bits of the eleventh register into N/2 most significant bits of a thirteenth register and to store N/2 most significant bits of the twelfth register into N/2 least significant bit of the thirteenth register. - View Dependent Claims (7, 8)
-
-
9. The method of 6, wherein:
N is 32.
Specification