Digital signal processor having instruction set with an x;function using reduced lookup table

0Associated
Cases 
0Associated
Defendants 
0Accused
Products 
17Forward
Citations 
0
Petitions 
5
Assignments
First Claim
1. A method performed by a vectorbased digital signal processor for evaluating a nonlinear x^{K }function for an input vector, x, said method comprising:
 obtaining one or more x^{K }software instructions that implement said nonlinear x^{K }function;
receiving said input vector comprising at least two scalar numbers and K;
in response to a predefined software instruction keyword for said at least one of said obtained x^{K }software instructions, invoking at least one hardware functional unit that implements said one or more x^{K }software instructions to perform the following steps for each component of said input vector, wherein said vectorbased processor processes said at least two scalar numbers of said input vector substantially simultaneously;
computing Log(x) in hardware;
multiplying said Log(x) value by K; and
determining said x^{K }function by applying an exponential function in hardware to a result of said multiplying step, wherein one or more of said computation of Log(x) and said exponential function employ at least one lookup table having entries with a fewer number of bits than a number of bits in the input vector, x, wherein said one or more x^{K }software instructions that implement said nonlinear x^{K }function is part of an instruction set of said vectorbased digital signal processor and wherein said nonlinear x^{K }function computes a K^{th }power of said input vector, x.
5 Assignments
0 Petitions
Accused Products
Abstract
A digital signal processor is provided having an instruction set with an x^{K }function that uses a reduced lookup table. The disclosed digital signal processor evaluates an x^{K }function for an input value, x, by computing Log(x) in hardware; multiplying the Log(x) value by K; and determining the x^{K }function by applying an exponential function in hardware to a result of the multiplying step. One or more of the computation of Log(x) and the exponential function employ at least one lookup table having entries with a fewer number of bits than a number of bits in the input value, x.
24 Citations
View as Search Results
Configurable gate array based on threedimensional writable memory  
Patent #
US 9,838,021 B2
Filed 03/06/2017

Current Assignee
Hangzhou Haicun Information Technology Co. Ltd.

Sponsoring Entity
Guobiao Zhang, Hangzhou Haicun Information Technology Co. Ltd.

Configurable gate array based on threedimensional printed memory  
Patent #
US 9,948,306 B2
Filed 03/05/2017

Current Assignee
Hangzhou Haicun Information Technology Co. Ltd.

Sponsoring Entity
Guobiao Zhang, Hangzhou Haicun Information Technology Co. Ltd.

Approximation of nonlinear functions in fixed point using lookup tables  
Patent #
US 10,037,306 B2
Filed 09/01/2016

Current Assignee
Qualcomm Inc.

Sponsoring Entity
Qualcomm Inc.

Configurable computing array comprising threedimensional writable memory  
Patent #
US 10,075,168 B2
Filed 10/25/2017

Current Assignee
XiaMen HaiCun IP Technology LLC

Sponsoring Entity
Guobiao Zhang, XiaMen HaiCun IP Technology LLC

Configurable computing array based on threedimensional vertical writable memory  
Patent #
US 10,075,169 B2
Filed 10/25/2017

Current Assignee
ChengDu HaiCun IP Technology LLC

Sponsoring Entity
Guobiao Zhang, ChengDu HaiCun IP Technology LLC

Configurable computing array  
Patent #
US 10,084,453 B2
Filed 10/25/2017

Current Assignee
ChengDu HaiCun IP Technology LLC

Sponsoring Entity
Guobiao Zhang, ChengDu HaiCun IP Technology LLC

Configurable gate array based on threedimensional writable memory  
Patent #
US 10,116,312 B2
Filed 10/13/2017

Current Assignee
Hangzhou Haicun Information Technology Co. Ltd.

Sponsoring Entity
Guobiao Zhang, Hangzhou Haicun Information Technology Co. Ltd.

Configurable computing array using twosided integration  
Patent #
US 10,141,939 B2
Filed 10/25/2017

Current Assignee
ChengDu HaiCun IP Technology LLC

Sponsoring Entity
Guobiao Zhang, ChengDu HaiCun IP Technology LLC

Configurable computing array die based on printed memory and twosided integration  
Patent #
US 10,148,271 B2
Filed 03/09/2018

Current Assignee
Hangzhou Haicun Information Technology Co. Ltd.

Sponsoring Entity
Guobiao Zhang, Hangzhou Haicun Information Technology Co. Ltd.

Configurable computing array based on threedimensional printed memory  
Patent #
US 10,211,836 B2
Filed 03/19/2018

Current Assignee
Hangzhou Haicun Information Technology Co. Ltd.

Sponsoring Entity
Guobiao Zhang, Hangzhou Haicun Information Technology Co. Ltd.

Configurable gate array comprising threedimensional printed memory  
Patent #
US 10,230,375 B2
Filed 03/13/2018

Current Assignee
Hangzhou Haicun Information Technology Co. Ltd.

Sponsoring Entity
Guobiao Zhang, Hangzhou Haicun Information Technology Co. Ltd.

Configurable computing array package based on printed memory  
Patent #
US 10,305,486 B2
Filed 03/09/2018

Current Assignee
Hangzhou Haicun Information Technology Co. Ltd.

Sponsoring Entity
Guobiao Zhang, Hangzhou Haicun Information Technology Co. Ltd.

Configurable computing array for implementing complex math functions  
Patent #
US 10,312,917 B2
Filed 08/08/2018

Current Assignee
Hangzhou Haicun Information Technology Co. Ltd.

Sponsoring Entity
Guobiao Zhang, Hangzhou Haicun Information Technology Co. Ltd.

Processor for realizing at least two categories of functions  
Patent #
US 10,372,359 B2
Filed 05/10/2017

Current Assignee
ChengDu HaiCun IP Technology LLC

Sponsoring Entity
ChengDu HaiCun IP Technology LLC

Configurable processor with inpackage lookup table  
Patent #
US 10,445,067 B2
Filed 11/28/2018

Current Assignee
Hangzhou Haicun Information Technology Co. Ltd.

Sponsoring Entity
Hangzhou Haicun Information Technology Co. Ltd.

Configurable computing array comprising configurable computing elements  
Patent #
US 10,456,800 B2
Filed 09/05/2018

Current Assignee
Hangzhou Haicun Information Technology Co. Ltd.

Sponsoring Entity
Hangzhou Haicun Information Technology Co. Ltd.

Configurable computing array  
Patent #
US 10,700,686 B2
Filed 11/11/2018

Current Assignee
Guobiao Zhang, Hangzhou Haicun Information Technology Co. Ltd.

Sponsoring Entity
Hangzhou Haicun Information Technology Co. Ltd.

System and method for generating a fixed point approximation to nonlinear functions  
Patent #
US 7,657,589 B2
Filed 08/17/2005

Current Assignee
Geo Semiconductor Incorporated

Sponsoring Entity
Maxim Integrated Products Inc.

Exponent Processing Systems and Methods  
Patent #
US 20090037504A1
Filed 08/02/2007

Current Assignee
VIA Technologies Incorporated

Sponsoring Entity
VIA Technologies Incorporated

Apparatus and method for computing a logarithm of a floatingpoint number  
Patent #
US 20040010532A1
Filed 07/09/2002

Current Assignee
Silicon Integrated Systems Corporation USA

Sponsoring Entity
Silicon Integrated Systems Corporation USA

Circuits and methods for implementing approximations to logarithms  
Patent #
US 20040122878A1
Filed 12/24/2002

Current Assignee
Lockheed Martin Corporation

Sponsoring Entity
Lockheed Martin Corporation

Instruction set for controlling a processor to convert linear data to logarithmic data in a single instruction that define the exponent filed of the logarithmic value  
Patent #
US 6,529,922 B1
Filed 12/22/1998

Current Assignee
Creative Technology Ltd.

Sponsoring Entity
Creative Technology Ltd.

Circuits, systems, and methods implementing approximations for logarithm, inverse logrithm,and reciprocal  
Patent #
US 20030220953A1
Filed 05/17/2002

Current Assignee
Texas Instruments Inc.

Sponsoring Entity
Texas Instruments Inc.

Method for implementing the power function DP and computer graphics system employing the same  
Patent #
US 5,990,894 A
Filed 06/16/1997

Current Assignee
Oracle America Inc.

Sponsoring Entity
Sun Microsystems Incorporated

29 Claims
 1. A method performed by a vectorbased digital signal processor for evaluating a nonlinear x^{K }function for an input vector, x, said method comprising:
obtaining one or more x^{K }software instructions that implement said nonlinear x^{K }function; receiving said input vector comprising at least two scalar numbers and K; in response to a predefined software instruction keyword for said at least one of said obtained x^{K }software instructions, invoking at least one hardware functional unit that implements said one or more x^{K }software instructions to perform the following steps for each component of said input vector, wherein said vectorbased processor processes said at least two scalar numbers of said input vector substantially simultaneously; computing Log(x) in hardware; multiplying said Log(x) value by K; and determining said x^{K }function by applying an exponential function in hardware to a result of said multiplying step, wherein one or more of said computation of Log(x) and said exponential function employ at least one lookup table having entries with a fewer number of bits than a number of bits in the input vector, x, wherein said one or more x^{K }software instructions that implement said nonlinear x^{K }function is part of an instruction set of said vectorbased digital signal processor and wherein said nonlinear x^{K }function computes a K^{th }power of said input vector, x.  View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
 15. A vectorbased digital signal processor for evaluating a nonlinear x^{K }function for an input vector, x, comprising:
a first input for receiving one or more x^{K }software instructions that implement said nonlinear x^{K }function; a data input for receiving said input vector comprising at least two scalar numbers and K; a set of hardware units responsive to the first input and the data input; a memory coupled to the hardware units and storing at least one lookup table wherein the vectorbased digital signal processor is operative to perform the following steps for each component of said input vector, wherein said vectorbased processor processes said at least two scalar numbers of said input vector substantially simultaneously; in response to a predefined software instruction keyword for said at least one of said received x^{K }software instructions, invoke at least one hardware unit that implements said one or more x^{K }software instructions operative to; compute Log(x) in hardware; multiply said Log(x) value by K; and determine said x^{K }function by applying an exponential function in hardware to a result of said multiplying step, wherein one or more of said computation of Log(x) and said exponential function employ at least one lookup table having entries with a fewer number of bits than a number of bits in the input vector, x, wherein said one or more of said x^{K }software instructions that implement said nonlinear x^{K }function is part of an instruction set of said digital signal processor and wherein said nonlinear x^{K }function computes a K^{th }power of said input vector, x.  View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
 29. An integrated circuit, comprising:
a vectorbased digital signal processor for evaluating a nonlinear x^{K }function for an input vector, x, comprising; a first input for receiving one or more x^{K }software instructions that implement said nonlinear x^{K }function; a data input for receiving said input vector comprising at least two scalar numbers and K; a memory storing at least one lookup table; and at least one processor, coupled to the memory, operative to; in response to a predefined software instruction keyword for said at least one of said received x^{K }software instructions, invoke at least one hardware functional unit that implements said one or more nonlinear x^{K }software instructions operative to perform the following steps for each component of said input vector, wherein said vectorbased processor processes said at least two scalar numbers of said input vector substantially simultaneously; compute Log(x) in hardware; multiply said Log(x) value by K; and determine said x^{K }function by applying an exponential function in hardware to a result of said multiplying step, wherein one or more of said computation of Log(x) and said exponential function employ at least one lookup table having entries with a fewer number of bits in the input vector, x, wherein said one or more of said x^{K }software instructions that implement said nonlinear x^{K }function is part of an instruction set of said vectorbased digital signal processor and wherein said nonlinear x^{K }function computes a K^{th }power of said input vector, x.
1 Specification
The present application is related to U.S. patent application Ser. No. 12/324,926, entitled “Digital Signal Processor Having Instruction Set with One or More NonLinear Complex Functions;” U.S. patent application Ser. No. 12/324,927, entitled “Digital Signal Processor Having Instruction Set With One Or More NonLinear Functions Using Reduced LookUp Table;” U.S. patent application Ser. No. 12/324,931, entitled “Digital Signal Processor Having Instruction Set with One or More NonLinear Functions Using Reduced LookUp Table with Exponentially Varying StepSize;” and U.S. patent application Ser. No. 12/324,934, entitled “Digital Signal Processor with One or More NonLinear Functions Using Factorized Polynomial Interpolation;” each filed Nov. 28, 2008 and incorporated by reference herein.
The present invention is related to digital signal processing techniques and, more particularly, to techniques for digital processing of nonlinear functions.
Digital signal processors (DSPs) are specialpurpose processors utilized for digital processing. Signals are often converted from analog form to digital form, manipulated digitally, and then converted back to analog form for further processing. Digital signal processing algorithms typically require a large number of mathematical operations to be performed quickly and efficiently on a set of data.
DSPs thus often incorporate specialized hardware to perform software operations that are often required for mathintensive processing applications, such as addition, multiplication, multiplyaccumulate (MAC), and shiftaccumulate. A MultiplyAccumulate architecture, for example, recognizes that many common data processing operations involve multiplying two numbers together, adding the resulting value to another value and then accumulating the result. Such basic operations can be efficiently carried out utilizing specialized highspeed multipliers and accumulators.
DSPs, however, generally do not provide specialized instructions to support nonlinear mathematical functions, such as exp, log, cos, 1/x and x^{K}. Increasingly, however, there is a need for nonlinear arithmetic operations in processors. A nonlinear function is any problem where the variable(s) to be solved for cannot be written as a linear sum of independent components. If supported at all, a DSP supports a nonlinear function by using a large lookup table (LUT). An exemplary LUT may store on the order of 2,000 16 bit values, and thus require 32 kilobits of random access memory (RAM). The LUT is typically implemented in a separate dedicated SRAM (so that data and the nonlinear LUT can be accessed at the same time to achieve improved performance).
In cases where the DSP is based on VLIW (Very Long Instruction Word) or SIMD (Single Instruction Multiple Data) architectures with N issues slots, the memory size becomes even larger. The LUT must be replicated N times because each issue slot must be able to read different values in the lookup table simultaneously, as the values of the data in each issue slot may be different. This replication of memory results in an even greater silicon area. For example, assuming a LUT in a 4way vector coprocessor, a memory size of 128 Kb is required (32 Kb×4). In addition, if different nonlinear functions are required for different parts of a program being executed, the various LUTs must be loaded into memory, thereby significantly increasing latency and potentially reducing performance.
A need therefore exists for a digital signal processor having an instruction set that supports an x^{K }function using a lookup table of reduced size.
Generally, a digital signal processor is provided having an instruction set with an x^{K }function that uses a reduced lookup table. According to one aspect of the invention, the disclosed digital signal processor evaluates an x^{K }function for an input value, x, by computing Log(x) in hardware; multiplying the Log(x) value by K; and determining the x^{K }function by applying an exponential function in hardware to a result of the multiplying step. One or more of the computation of Log(x) and the exponential function employ at least one lookup table having entries with a fewer number of bits than a number of bits in the input value, x.
The Log(x) value can obtained by decomposing the input value, x, to a first part, N, a second part, q, and a remaining part, r, wherein the first part, N, is identified by a position of a most significant bit of the input value, x, and the second part, q, is comprised of a number of bits following the most significant bit, wherein the number is small relative to a number of bits in the input value, x. The logarithm function can be determined for the input value, x, by summing values of N,
and Log_{2}(1+ε), where said epsilon term, ε, is computed using the expression
where
is obtained from a lookup table.
The exponential function of the result can be obtained by decomposing the input value, x, to an integer part, N, a first fractional part, q_{1}, larger than a specified value, x_{0}, and a second fractional part, q_{2}, smaller than the specified value, x_{0}. The exponential function for the result is obtained by multiplying 2^{q}^{2}, 2^{q}^{1 }and 2^{N }together.
A more complete understanding of the present invention, as well as further features and advantages of the present invention, will be obtained by reference to the following detailed description and drawings.
The present invention provides a digital signal processor that supports an x^{K }function using one or more lookup tables of reduced size. The present invention provides a digital signal processor that computes arbitrary powers of the input data, x, such as x^{2}, x^{3}, 1/x and sqrt(x), by companding the data (i.e., first taking log(x) using a linear operation (multiply by k)) and then taking an exponential of the result. Generally, one or more lookup tables store a subset of values for at least a portion of the computation of the logarithm or exponential functions. As used herein, the term “digital signal processor” shall be a processor that executes instructions in program code. Further, a hardwired logic implementation of digital signal processing functions is not considered herein. It is noted that the disclosed x^{K }function can be applied for values of x that are scalar or vector inputs.
In this manner, the present invention supports x^{K }functions by using a smaller lookup table than required by conventional techniques. As previously indicated, an exemplary lookup table may store on the order of 2,000 16 bit values, and thus require 32 kilobits of random access memory (RAM). With the present invention, a smaller lookup table can be employed to store a subset of the 2,000 values.
As discussed hereinafter, in various embodiments, the digital signal processor 100 may use hardware or a lookup table (or a combination thereof) to compute the x^{K }function. Generally, if the digital signal processor 100 is processing software code that includes a predefined instruction keyword corresponding to an x^{K }function and any appropriate operands for the function, the instruction decoder must trigger the appropriate x^{K }functional units 110 that is required to process the instruction. It is noted that an x^{K }functional unit 110 can be shared by more than one instruction.
Generally, the present invention extends conventional digital signal processors to provide an enhanced instruction set that supports x^{K }functions using one or more lookup tables. The digital signal processor 100 in accordance with the present invention receives at least one number as an input, applies an x^{K }function to the input and generates an output value.
The disclosed digital signal processors may have a scalar architecture, as shown in
The present invention recognizes that an x^{K }function can be computed using the following expression:
x^{K}=e^{log(x}^{K}^{)} (1)
since exponential and logarithm functions are inverse functions. Further, since log(x^{K}) equals K·log(x), then
x^{K}=e^{K·Log(x)} (2)
The logarithm function performed during step 210 can employ, for example, the techniques described in U.S. patent application Ser. No. 12/362,899, filed contemporaneously herewith, entitled “Digital Signal Processor Having Instruction Set With A Logarithm Function Using Reduced LookUp Table,” incorporated by reference herein. Generally, the logarithm of an input value, x, can be obtained by decomposing the input value, x, to a first part, N, a second part, q, and a remaining part, r, wherein the first part, N, is identified by a position of a most significant bit of the input value, x, and the second part, q, is comprised of a number of bits following the most significant bit, wherein the number is small relative to a number of bits in the input value, x. A value
is obtained from a first lookup table based on the second part, q. An epsilon term, ε, is computed using the expression
and an expression Log_{2}(1+ε) is evaluated using a polynomial approximation. The desired logarithm function is then determined for the input value, x, by summing the values of N,
and Log_{2}(1+ε). An initial basis of the logarithm function can optionally be translated from a binary representation to an arbitrary basis, Y, by multiplying a result of summing operation by Log_{Y}(2), where log_{Y}(2), is obtained from a lookup table. In addition, the value
can be obtained from a lookup table. The epsilon term, ε, can be computed by shifting r by N and multiplying by
The exponential function performed during step 230 can employ, for example, the techniques described in U.S. patent application Ser. No. 12/362,879, filed contemporaneously herewith, entitled “Digital Signal Processor Having Instruction Set With An Exponential Function Using Reduced LookUp Table,” incorporated by reference herein. Generally, an exponential function of an input value, x, can be obtained by decomposing the input value, x, to an integer part, N, a first fractional part, q_{1}, larger than a specified value, x_{0}, and a second fractional part, q_{2}, smaller than the specified value, x_{0}. A value 2^{q}^{2 }is computed using a polynomial approximation. A value 2^{q}^{1 }can be obtained from a lookup table. Finally, the exponential function for the input value, x, is obtained by multiplying 2^{q}^{2}, 2^{q}^{1 }and 2^{N }together.
An initial basis, Z, of the input value, x, can optionally be converted to a desired basis, Y, by multiplying the input value, x, by log_{Z}(Y), where log_{Z}(Y), is obtained from a second lookup table. The multiplication can be performed by first multiplying the values 2^{q}^{2 }and 2^{q}^{1 }together and the multiplication by 2^{N }is performed by shifting a result of the first multiplication by N bits. The 2^{N }value can be computed using a barrel shifter. The entries in the lookup table have a fewer number of bits than a number of bits in the input value, x.
As noted above, the input to the vectorbased digital signal processor 300 is a vector, X, comprised of a plurality of scalar numbers, x_{n}, that are processed in parallel. For example, assume a vectorbased digital signal processor 300 supports an x^{K }function for a vector, X, where X is comprised of scalar numbers x_{1 }through x_{4}. The exemplary x^{K }function may be expressed as follows:
Pow_vec4(x_{1},x_{2},x_{3},x_{4},K).
While exemplary embodiments of the present invention have been described with respect to digital logic blocks and memory tables within a digital signal processor, as would be apparent to one skilled in the art, various functions may be implemented in the digital domain as processing steps in a software program, in hardware by circuit elements or state machines, or in combination of both software and hardware. Such software may be employed in, for example, a digital signal processor, application specific integrated circuit or microcontroller. Such hardware and software may be embodied within circuits implemented within an integrated circuit.
Thus, the functions of the present invention can be embodied in the form of methods and apparatuses for practicing those methods. One or more aspects of the present invention can be embodied in the form of program code, for example, whether stored in a storage medium, loaded into and/or executed by a machine, wherein, when the program code is loaded into and executed by a machine, such as a processor, the machine becomes an apparatus for practicing the invention. When implemented on a generalpurpose processor, the program code segments combine with the processor to provide a device that operates analogously to specific logic circuits. The invention can also be implemented in one or more of an integrated circuit, a digital signal processor, a microprocessor, and a microcontroller.
It is to be understood that the embodiments and variations shown and described herein are merely illustrative of the principles of this invention and that various modifications may be implemented by those skilled in the art without departing from the scope and spirit of the invention.