Vector math instruction execution by DSP processor approximating division and complex number magnitude

US 9,015,452 B2
Filed: 02/18/2010
Issued: 04/21/2015
Est. Priority Date: 02/18/2009
Status: Active Grant

First Claim

Patent Images

1. A method for performing an approximate division on a digital signal processor having a plurality of registers, each register storing data of N bits, and a plurality of work units, each work unit performing data processing operations under instruction control, the steps comprising:

storing a first numerator operand of N/2 bits in a set of N/2 most significant bits of a first register of the plurality of registers;

storing a second numerator operand of N/2 bits in a set of N/2 least significant bits of the first register;

storing a first denominator operand of N/2 bits in a set of N/2 most significant bits of a second register of the plurality of registers;

storing a second denominator operand of N/2 bits in a set of N/2 least significant bits of the second register;

employing one of the work units to separatelyform a first absolute value of the most significant bits of the second register and store the first absolute value in the most significant bits of a third register of the plurality of registers, andform a second absolute value of the least significant bits of the second register and store the second absolute value in the least significant bits of the third register;

employing one of the work units to extract the first absolute value from the third register;

employing one of the work units to determine a number of unused bits in the first absolute value;

employing one of the work units to generate a headroom h_n+1of the first denominator operand by extracting a predetermined number of bits of the number of unused bits in the first absolute value;

employing one of the work units to compute a first shift factor s_n+1for the first denominator operand by adding the first headroom h_n+1to a first constant and subtracting a number of fractional bits in the first denominator operand;

employing one of the work units to extract the second absolute value from the third register;

employing one of the work units to determine a number of unused bits in the second absolute value;

employing one of the work units to generate a second headroom h_nof the second denominator operand by extracting a predetermined number of bits of the number of unused bits in the second absolute value;

employing one of the work units to compute a second shift factor s_nfor the second denominator operand by adding the second headroom h_nto the first constant and subtracting a number of fractional bits in the second denominator operand;

employing one to the work units to generate a first intermediate result by left shifting the data in the third register by an amount of the first headroom h_n+1bits;

employing one of the work units to generate a first division LUT index i_n+1by right shifting the first intermediate by a second constant;

employing one to the work units to generating a second intermediate result by left shifting the data in the third register by an amount of the second headroom h_nbits;

employing one of the work units to generating a second division LUT index i_nby right shifting the first intermediate by a third constant;

employing one of the work units to generate a third intermediate result by performing an exclusive OR of data in the first register and data in the second register;

employing one of the work units to extract a most significant bit of the most significant bits of the third intermediate result;

employing one of the work units to generate a sign of a first division m_n+1by subtracting twice the most significant bit of the most significant bits of the third intermediate result from 1;

employing one of the work units to extract a most significant bit of the least significant bits of the third intermediate result;

employing one of the work units to generate a sign of a second division m_n+1by subtracting twice the most significant bit of the least significant bits of the third intermediate result from 1;

employing one of the work units to generate a first look-up table value by indexing a look-up table with the first division LUT index i_n+1;

employing one of the work units to generate a second look-up table value by indexing a look-up table with the second division LUT index i_n;

employing one of the work units to the store first look-up table value in most significant bits of a fourth register of the plurality of registers and store the second look-up table value in least significant bits of the fourth register;

employing one of the work units to generate a first product by multiplying the most significant bits of the fourth register by the most significant bits of the third register;

employing one of the work units to generate a first absolute division value by performing a saturated left shift of the first product by the headroom h_n+1;

employing one of the work units to generate a first division value by multiplying the first absolute division value by the sign of a first division m_n+1;

employing one of the work units to generate a second product by multiplying the least significant bits of the fourth register by the least significant bits of the third register;

employing one of the work units to generate a second absolute division value by performing a saturated left shift of the second product by the headroom h_n;

employing one of the work units to generate a second division value by multiplying the second absolute division value by the sign of a second division m_n; and

employing one of the work units to generate a packed division by storing the first division value in most significant bits of a fifth register of the plurality of registers and storing the second division value in least significant bits of the fifth register.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A digital signal processor (DSP) includes an instruction fetch unit, an instruction decode unit, a register set and a plurality of work units in communication with the instruction decode unit. A first embodiment calculates two divisions on packed numerators and packed denominators. The DSP work units calculate indexes into a 1/d look-up table and make a final sign correction. A second embodiment calculates an approximation of a vector magnitude of a complex number x+jy. The approximation is based upon √(x²+y²)≈α*max(|x|, |y|)+β*min(|x|, |y|). The DSP work units calculate the absolute values, find the maxima and minima, and form the packed results of two vector magnitude calculations.

Citations

9 Claims

1. A method for performing an approximate division on a digital signal processor having a plurality of registers, each register storing data of N bits, and a plurality of work units, each work unit performing data processing operations under instruction control, the steps comprising:
- storing a first numerator operand of N/2 bits in a set of N/2 most significant bits of a first register of the plurality of registers;
  
  storing a second numerator operand of N/2 bits in a set of N/2 least significant bits of the first register;
  
  storing a first denominator operand of N/2 bits in a set of N/2 most significant bits of a second register of the plurality of registers;
  
  storing a second denominator operand of N/2 bits in a set of N/2 least significant bits of the second register;
  
  employing one of the work units to separatelyform a first absolute value of the most significant bits of the second register and store the first absolute value in the most significant bits of a third register of the plurality of registers, andform a second absolute value of the least significant bits of the second register and store the second absolute value in the least significant bits of the third register;
  
  employing one of the work units to extract the first absolute value from the third register;
  
  employing one of the work units to determine a number of unused bits in the first absolute value;
  
  employing one of the work units to generate a headroom h_n+1of the first denominator operand by extracting a predetermined number of bits of the number of unused bits in the first absolute value;
  
  employing one of the work units to compute a first shift factor s_n+1for the first denominator operand by adding the first headroom h_n+1to a first constant and subtracting a number of fractional bits in the first denominator operand;
  
  employing one of the work units to extract the second absolute value from the third register;
  
  employing one of the work units to determine a number of unused bits in the second absolute value;
  
  employing one of the work units to generate a second headroom h_nof the second denominator operand by extracting a predetermined number of bits of the number of unused bits in the second absolute value;
  
  employing one of the work units to compute a second shift factor s_nfor the second denominator operand by adding the second headroom h_nto the first constant and subtracting a number of fractional bits in the second denominator operand;
  
  employing one to the work units to generate a first intermediate result by left shifting the data in the third register by an amount of the first headroom h_n+1bits;
  
  employing one of the work units to generate a first division LUT index i_n+1by right shifting the first intermediate by a second constant;
  
  employing one to the work units to generating a second intermediate result by left shifting the data in the third register by an amount of the second headroom h_nbits;
  
  employing one of the work units to generating a second division LUT index i_nby right shifting the first intermediate by a third constant;
  
  employing one of the work units to generate a third intermediate result by performing an exclusive OR of data in the first register and data in the second register;
  
  employing one of the work units to extract a most significant bit of the most significant bits of the third intermediate result;
  
  employing one of the work units to generate a sign of a first division m_n+1by subtracting twice the most significant bit of the most significant bits of the third intermediate result from 1;
  
  employing one of the work units to extract a most significant bit of the least significant bits of the third intermediate result;
  
  employing one of the work units to generate a sign of a second division m_n+1by subtracting twice the most significant bit of the least significant bits of the third intermediate result from 1;
  
  employing one of the work units to generate a first look-up table value by indexing a look-up table with the first division LUT index i_n+1;
  
  employing one of the work units to generate a second look-up table value by indexing a look-up table with the second division LUT index i_n;
  
  employing one of the work units to the store first look-up table value in most significant bits of a fourth register of the plurality of registers and store the second look-up table value in least significant bits of the fourth register;
  
  employing one of the work units to generate a first product by multiplying the most significant bits of the fourth register by the most significant bits of the third register;
  
  employing one of the work units to generate a first absolute division value by performing a saturated left shift of the first product by the headroom h_n+1;
  
  employing one of the work units to generate a first division value by multiplying the first absolute division value by the sign of a first division m_n+1;
  
  employing one of the work units to generate a second product by multiplying the least significant bits of the fourth register by the least significant bits of the third register;
  
  employing one of the work units to generate a second absolute division value by performing a saturated left shift of the second product by the headroom h_n;
  
  employing one of the work units to generate a second division value by multiplying the second absolute division value by the sign of a second division m_n; and
  
  employing one of the work units to generate a packed division by storing the first division value in most significant bits of a fifth register of the plurality of registers and storing the second division value in least significant bits of the fifth register.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1, further comprising the steps of:
    - employing one of the work units to round the first product by adding a first rounding value to the first product; and
      
      employing one of the work units to round the second product by adding a second rounding value to the second product.
  - 3. The method of claim 2, wherein:
    - the first rounding value is dependent upon the first shift factor s_n+1; and
      
      the second rounding value is dependent upon the second shift factor s_n.
  - 4. The method of claim 1, wherein:
    - N is 32;
      
      the first constant is 23;
      
      the second constant is 23; and
      
      the third constant is 7.
  - 5. The method of claim 1, wherein:
    - the look-up table includes a plurality of entries, each entry storing an index d and a corresponding value approximating 1/d.

6. A method of calculating an approximation of a vector magnitude of complex numbers on a digital signal processor having a plurality of registers, each register storing data of N bits, and a plurality of work units, each work unit performing data processing operations under instruction control, the steps comprising:
- storing a first imaginary part operand of N/2 bits in a set of N/2 most significant bits of a first register of the plurality of registers;
  
  storing a first real part operand of N/2 bits in a set of N/2 least significant bits of the first register;
  
  storing a second imaginary part operand of N/2 bits in a set of N/2 most significant bits of a second register of the plurality of registers;
  
  storing a second real part operand of N/2 bits in a set of N/2 least significant bits of the second register;
  
  employing one of the work units to separately generate an absolute value of the first imaginary part and an absolute value of the first real part, storing the absolute value of the first imaginary part in the N/2 most significant bits of a third register of the plurality of register and storing the absolute value of the first real part in the N/2 least significant bits of the third register;
  
  employing one of the work units to separately generate an absolute value of the second imaginary part and an absolute value of the second real part, storing the absolute value of the second imaginary part in the N/2 most significant bits of a fourth register of the plurality of register and storing the absolute value of the second real part in the N/2 least significant bits of the fourth register;
  
  employing one of the work units to store the absolute value of the first imaginary part in a N/2 most significant bits of a fifth register of a first register pair of the plurality of registers, store the absolute value of the second imaginary part in a N/2 least significant bits of the fifth register, store the absolute value of the first real part in a N/2 most significant bits of a sixth register of the first register pair of the plurality of register and store the absolute value of the second real part in a N/2 least significant bits of the sixth register;
  
  employing one of the work units to store a first maximum value of the absolute value the first imaginary value and the absolute value of the first real value in N/2 most significant bits of a seventh register and to store a second maximum value of the second imaginary value and the absolute value of the second real value in N/2 least significant bits of the seventh register;
  
  employing one of the work units to store a first minimum value of the absolute value the first imaginary value and the absolute value of the first real value in N/2 most significant bits of an eighth register and to store a second minimum value of the second imaginary value and the absolute value of the second real value in N/2 least significant bits of the eighth register;
  
  employing one of the work units to store the first maximum value in N/2 most significant bits of a ninth register of a second register pair, to store the first minimum value in N/2 least significant bits of the ninth register, to store the second maximum value in N/2 most significant bits of a tenth register of the second register pair and to store the second minimum value in N/2 least significant bits to the tenth register;
  
  storing a first constant in N/2 most significant bits of a ninth register;
  
  storing a second constant in N/2 least significant bits of the ninth register;
  
  employing one of the work units to add a first product of the first maximum value and the first constant to a second product of the first minimum value and the second constant and store a sum in an eleventh register;
  
  employing one of the work units to add a third product of the second maximum value and the second constant to a fourth product of the second minimum value and the second constant and store a sum in a twelfth register;
  
  employing one of the work units to store N/2 most significant bits of the eleventh register into N/2 most significant bits of a thirteenth register and to store N/2 most significant bits of the twelfth register into N/2 least significant bit of the thirteenth register.
- View Dependent Claims (7, 8)
- - 7. The method of claim 6, wherein:
    - the first constant is 0.947543636291; and
      
      the second constant is 0.39248542509.
  - 8. The method of claim 6, wherein:
    - the first constant is 0.960433870103; and
      
      the second constant is 0.39782473475.

9. The method of 6, wherein:
- N is 32.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Texas Instruments, Inc.
Original Assignee
Texas Instruments, Inc.
Inventors
Dasgupta, Udayan
Primary Examiner(s)
KIM, KENNETH S

Application Number

US12/708,180
Publication Number

US 20100211761A1
Time in Patent Office

1,888 Days
Field of Search

712/E9.071, 712/222, 712/300, 712/4
US Class Current

712/222
CPC Class Codes

G06F 2207/3828   Multigauge devices, i.e. ca...

G06F 2207/5354   Using table lookup, e.g. fo...

G06F 2207/5525   Pythagorean sum, i.e. the s...

G06F 7/4806   Computations with complex n...

G06F 7/535   Dividing only

G06F 7/548   Trigonometric functions; Co...

G06F 7/552   Powers or roots , e.g. Pyth...

G06F 9/30014   with variable precision

G06F 9/30036   Instructions to perform ope...

G06F 9/30038   using a mask

G06F 9/30065   Loop control instructions; ...

G06F 9/3887   controlled by a single inst...

G06F 9/3891   organised in groups of unit...

Vector math instruction execution by DSP processor approximating division and complex number magnitude

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

9 Claims

Specification

Solutions

Use Cases

Quick Links

Vector math instruction execution by DSP processor approximating division and complex number magnitude

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

9 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links