Configurable processor with inpackage lookup table

0Associated
Cases 
0Associated
Defendants 
0Accused
Products 
0Forward
Citations 
0
Petitions 
0
Assignments
First Claim
1. A configurable processor including a plurality of configurable computing elements, each of said configurable computing elements comprising:
 at least a programmable memory array on a memory level for storing at least a portion of a lookup table (LUT) for a mathematical function;
at least an arithmetic logic circuit (ALC) on a logic level for performing at least an arithmetic operation on selected data from said LUT, wherein said logic level is a different physical level than said memory level; and
means for communicatively coupling said programmable memory array and said ALC;
wherein said mathematical function includes more operations than arithmetic operations performable by said ALC.
0 Assignments
0 Petitions
Accused Products
Abstract
A configurable processor comprises a memory die and a logic die. The memory die comprises a programmable memory array for storing a lookup table (LUT) for a mathematical function, while the logic die comprises an arithmetic logic circuit (ALC) for performing at least an arithmetic operation on selected data from the LUT, wherein said mathematical function includes more operation than the arithmetic operations performable by the ALC. Complex mathematical functions can be implemented and configured.
44 Citations
No References
Division with rectangular multiplier supporting multiple precisions and operand types  
Patent #
US 7,962,543 B2
Filed 06/01/2007

Current Assignee
Advanced Micro Devices Inc.

Sponsoring Entity
Advanced Micro Devices Inc.

METHOD FOR FABRICATION OF A SEMICONDUCTOR DEVICE AND STRUCTURE  
Patent #
US 20100289064A1
Filed 08/03/2010

Current Assignee
Monolithic 3D Inc.

Sponsoring Entity
Monolithic 3D Inc.

Condensed Galois field computing system  
Patent #
US 7,512,647 B2
Filed 11/22/2004

Current Assignee
Analog Devices Inc.

Sponsoring Entity
Analog Devices Inc.

High speed hardware implementation of modified ReedSolomon decoder  
Patent #
US 7,539,927 B2
Filed 04/14/2005

Current Assignee
Industrial Technology Research Institute

Sponsoring Entity
Industrial Technology Research Institute

Structures for LUTbased arithmetic in PLDs  
Patent #
US 7,558,812 B1
Filed 11/26/2003

Current Assignee
Altera Corporation

Sponsoring Entity
Altera Corporation

Digital signal processor having inverse discrete cosine transform engine for video decoding and partitioned distributed arithmetic multiply/accumulate unit therefor  
Patent #
US 7,574,468 B1
Filed 03/18/2005

Current Assignee
VeriSilicon Holdings Co. Ltd.

Sponsoring Entity
VeriSilicon Holdings Co. Ltd.

Arithmetic method and function arithmetic circuit for a fast fourier transform  
Patent #
US 7,634,524 B2
Filed 04/14/2004

Current Assignee
Fujitsu Limited

Sponsoring Entity
Fujitsu Limited

Methods and apparatus for fast argument reduction in a computing system  
Patent #
US 7,366,748 B1
Filed 06/30/2000

Current Assignee
Micron Technology Inc.

Sponsoring Entity
Intel Corporation

Arithmetic unit for approximating function  
Patent #
US 7,472,149 B2
Filed 08/25/2004

Current Assignee
Toshiba Corporation

Sponsoring Entity
Toshiba Corporation

Circuit for the inner or scalar product computation in Galois fields  
Patent #
US 7,206,410 B2
Filed 10/10/2001

Current Assignee
Stmicroelectronics SRL

Sponsoring Entity
Stmicroelectronics SRL

Error correction code circuit with reduced hardware complexity  
Patent #
US 7,028,247 B2
Filed 12/25/2002

Current Assignee
Novatek Microelectronics Corporation

Sponsoring Entity
Faraday Technology Corp.

Method for reducing memory size in logarithmic number system arithmetic units  
Patent #
US 20060106905A1
Filed 11/17/2004

Current Assignee
STMicroelectronics Incorporated

Sponsoring Entity
STMicroelectronics Incorporated

Converting mathematical functions to power series  
Patent #
US 20040044710A1
Filed 08/28/2002

Current Assignee
Intel Corporation

Sponsoring Entity
Intel Corporation

Graphics processing with transcendental function generator  
Patent #
US 6,181,355 B1
Filed 07/15/1999

Current Assignee
RPX Corporation

Sponsoring Entity
3DLabs Incorporated Limited

Efficient lookup table methods for ReedSolomon decoding  
Patent #
US 6,263,470 B1
Filed 11/25/1998

Current Assignee
Texas Instruments Inc.

Sponsoring Entity
Texas Instruments Inc.

Method for enlargement/reduction of image data in digital image processing system and circuit adopting the same  
Patent #
US 5,901,274 A
Filed 12/05/1996

Current Assignee
Samsung Electronics Co. Ltd.

Sponsoring Entity
Samsung Electronics Co. Ltd.

Method of generating sine/cosine function and apparatus using the same for use in digital signal processor  
Patent #
US 5,954,787 A
Filed 12/01/1997

Current Assignee
WiLAN Inc.

Sponsoring Entity
Daewoo Electronics

Threedimensional readonly memory  
Patent #
US 5,835,396 A
Filed 10/17/1996

Current Assignee
Guobiao Zhang

Sponsoring Entity
Guobiao Zhang

Variablelength decoding apparatus  
Patent #
US 5,604,499 A
Filed 12/14/1994

Current Assignee
Matsushita Electric Industrial Company Limited

Sponsoring Entity
Matsushita Electric Industrial Company Limited

Method and apparatus for performing division using a rectangular aspect ratio multiplier  
Patent #
US 5,046,038 A
Filed 08/02/1989

Current Assignee
VIACyrix Inc.

Sponsoring Entity
Cyrix Corp

Method and apparatus for performing the square root function using a rectangular aspect ratio multiplier  
Patent #
US 5,060,182 A
Filed 09/05/1989

Current Assignee
Advanced Micro Devices Inc.

Sponsoring Entity
Cyrix Corp

Configurable electrical circuit having configurable logic elements and configurable interconnects  
Patent #
US 4,870,302 A
Filed 02/19/1988

Current Assignee
Xilinx Inc.

Sponsoring Entity
Xilinx Inc.

SYSTEM COMPRISING A SEMICONDUCTOR DEVICE AND STRUCTURE  
Patent #
US 20120129301A1
Filed 10/14/2011

Current Assignee
Monolithic 3D Inc.

Sponsoring Entity
Monolithic 3D Inc.

Method of constructing a semiconductor device and structure  
Patent #
US 8,273,610 B2
Filed 10/14/2011

Current Assignee
Monolithic 3D Inc.

Sponsoring Entity
Monolithic 3D Inc.

SYSTEM COMPRISING A SEMICONDUCTOR DEVICE AND STRUCTURE  
Patent #
US 20120248595A1
Filed 06/08/2012

Current Assignee
MonolithlC 3D Inc.

Sponsoring Entity
MonolithlC 3D Inc.

Apparatus and method for texture level of detail computation  
Patent #
US 8,487,948 B2
Filed 12/21/2011

Current Assignee
Giquila Corp.

Sponsoring Entity
Giquila Corp.

DATAPATH CIRCUIT FOR DIGITAL SIGNAL PROCESSORS  
Patent #
US 20140067889A1
Filed 08/27/2013

Current Assignee
Analog Devices Inc.

Sponsoring Entity
Analog Devices Inc.

Efficient 2D and 3D graphics processing  
Patent #
US 8,203,564 B2
Filed 02/16/2007

Current Assignee
Qualcomm Inc.

Sponsoring Entity
Qualcomm Inc.

Vector math instruction execution by DSP processor approximating division and complex number magnitude  
Patent #
US 9,015,452 B2
Filed 02/18/2010

Current Assignee
Texas Instruments Inc.

Sponsoring Entity
Texas Instruments Inc.

3D semiconductor device and structure with backbias  
Patent #
US 9,136,153 B2
Filed 06/08/2012

Current Assignee
Monolithic 3D Inc.

Sponsoring Entity
Monolithic 3D Inc.

Digital signal processor having instruction set with an x;function using reduced lookup table  
Patent #
US 9,207,910 B2
Filed 01/30/2009

Current Assignee
LSI Corporation

Sponsoring Entity
Intel Corporation

Nonlinear modeling of a physical system using lookup table with polynomial interpolation  
Patent #
US 9,225,501 B2
Filed 03/31/2014

Current Assignee
Intel Corporation

Sponsoring Entity
Intel Corporation

Math circuit for estimating a transcendental function  
Patent #
US 9,465,580 B2
Filed 12/21/2011

Current Assignee
Intel Corporation

Sponsoring Entity
Intel Corporation

Computer and methods for solving math functions  
Patent #
US 9,606,796 B2
Filed 10/30/2013

Current Assignee
Texas Instruments Inc.

Sponsoring Entity
Texas Instruments Inc.

Processor Comprising ThreeDimensional Memory (3DM) Array  
Patent #
US 20170237440A1
Filed 04/13/2017

Current Assignee
Hangzhou Haicun Information Technology Co. Ltd.

Sponsoring Entity
Hangzhou Haicun Information Technology Co. Ltd.

Processor with Backside LookUp Table  
Patent #
US 20170322770A1
Filed 05/04/2017

Current Assignee
ChengDu HaiCun IP Technology LLC

Sponsoring Entity
ChengDu HaiCun IP Technology LLC

Configurable Processor with Backside LookUp Table  
Patent #
US 20170322774A1
Filed 05/06/2017

Current Assignee
ChengDu HaiCun IP Technology LLC

Sponsoring Entity
ChengDu HaiCun IP Technology LLC

Configurable Processor with InPackage LookUp Table  
Patent #
US 20170322771A1
Filed 05/06/2017

Current Assignee
ChengDu HaiCun IP Technology LLC

Sponsoring Entity
ChengDu HaiCun IP Technology LLC

Processor with InPackage LookUp Table  
Patent #
US 20170322906A1
Filed 05/04/2017

Current Assignee
ChengDu HaiCun IP Technology LLC

Sponsoring Entity
ChengDu HaiCun IP Technology LLC

Simulation Processor with InPackage LookUp Table  
Patent #
US 20170323041A1
Filed 05/04/2017

Current Assignee
ChengDu HaiCun IP Technology LLC

Sponsoring Entity
ChengDu HaiCun IP Technology LLC

Simulation Processor with Backside LookUp Table  
Patent #
US 20170323042A1
Filed 05/04/2017

Current Assignee
ChengDu HaiCun IP Technology LLC

Sponsoring Entity
ChengDu HaiCun IP Technology LLC

Processor for Realizing at least Two Categories of Functions  
Patent #
US 20170329548A1
Filed 05/10/2017

Current Assignee
ChengDu HaiCun IP Technology LLC

Sponsoring Entity
ChengDu HaiCun IP Technology LLC

Configurable Processor with InPackage LookUp Table  
Patent #
US 20190114138A1
Filed 11/28/2018

Current Assignee
Hangzhou Haicun Information Technology Co. Ltd.

Sponsoring Entity
Hangzhou Haicun Information Technology Co. Ltd.

Processor Using MemoryBased Computation  
Patent #
US 20190114170A1
Filed 11/12/2018

Current Assignee
Hangzhou Haicun Information Technology Co. Ltd.

Sponsoring Entity
Hangzhou Haicun Information Technology Co. Ltd.

20 Claims
 1. A configurable processor including a plurality of configurable computing elements, each of said configurable computing elements comprising:
at least a programmable memory array on a memory level for storing at least a portion of a lookup table (LUT) for a mathematical function; at least an arithmetic logic circuit (ALC) on a logic level for performing at least an arithmetic operation on selected data from said LUT, wherein said logic level is a different physical level than said memory level; and means for communicatively coupling said programmable memory array and said ALC; wherein said mathematical function includes more operations than arithmetic operations performable by said ALC.  View Dependent Claims (2, 3, 4, 5)
 6. A configurable processor for implementing a mathematical function, comprising:
at least first and second programmable memory arrays on a memory level, wherein said first programmable memory array stores at least a first portion of a first lookup table (LUT) for a first mathematical function; and
, said second programmable memory array stores at least a second portion of a second LUT for a second mathematical function;at least an arithmetic logic circuit (ALC) on a logic level for performing at least an arithmetic operation on selected data from said first or second LUT, wherein said logic level is a different physical level than said memory level; and means for communicatively coupling said first or second programmable memory array with said ALC; wherein said mathematical function is a combination of at least said first and second mathematical functions; and
, each of said first and second mathematical functions includes more operations than arithmetic operations performable by said ALC. View Dependent Claims (7, 8, 9, 10)
 11. A configurable computing array for implementing a mathematical function, comprising:
at least an array of configurable logic elements including a configurable logic element, wherein said configurable logic element selectively realizes a logic function in a logic library; at least an array of configurable computing elements comprising at least a first programmable memory array, a second programmable memory array and an arithmetic logic circuit (ALC), wherein said first programmable memory array stores at least a first portion of a first lookup table (LUT) for a first mathematical function;
said second programmable memory array stores at least a second portion of a second LUT for a second mathematical function; and
, said ALC performs at least an arithmetic operation on selected data from said first or second LUT;means for communicatively coupling said configurable logic elements and said configurable computing elements; whereby said configurable computing array realizes said mathematical function by programming said configurable logic elements and said configurable computing elements, wherein said mathematical function is a combination of at least said first and second mathematical functions; wherein each of said first and second mathematical functions includes more operations than arithmetic operations included in said logic library; and
, each of said first and second mathematical functions includes more operations than arithmetic operations performable by said ALC. View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
1 Specification
This application is a continuationinpart of U.S. patent application Ser. No. 15/588,642, filed May 6, 2017, which claims priority from Chinese Patent Application 201610301645.8, filed May 6, 2016; Chinese Patent Application 201710310865.1, filed May 5, 2017, in the State Intellectual Property Office of the People'"'"'s Republic of China (CN), the disclosure of which are incorporated herein by references in their entireties.
The present invention relates to the field of integrated circuit, and more particularly to processors.
Conventional processors use logicbased computation (LBC), which carries out computation primarily with logic circuits (e.g. XOR circuit). Logic circuits are suitable for arithmetic functions, whose operations consist of basic arithmetic operations only, i.e. addition, subtraction and multiplication. However, logic circuits are not suitable for nonarithmetic functions, whose operations are more than the above arithmetic operations performable by the conventional logic circuits. Exemplary nonarithmetic functions include transcendental functions and special functions. Nonarithmetic functions are computationally hard and their hardware implementation has been a major challenge.
A complex function is a nonarithmetic function with multiple independent variables (independent variable is also known as input variable or argument). It can be expressed as a combination of basic functions. A basic function is a nonarithmetic function with a single independent variable. Exemplary basic functions include basic transcendental functions, such as exponential function (exp), logarithmic function (log), trigonometric functions (sin, cos, tan, atan) and others.
For the conventional processors, all complex functions and most basic functions are implemented by software; only a small number of basic functions (e.g. basic algebraic functions and basic transcendental functions) are implemented by hardware, which are referred to as builtin functions. These builtin functions are realized by a combination of arithmetic operations and lookup tables (LUT). For example, U.S. Pat. No. 5,954,787 issued to Eun on Sep. 21, 1999 taught a method for generating sine/cosine functions using lookup tables; U.S. Pat. No. 9,207,910 issued to Azadet et al. on Dec. 8, 2015 taught a method for calculating a power function using LUTs.
Realization of builtin functions is further illustrated in
The 2D integration puts stringent requirements on the manufacturing process. As is well known in the art, the memory transistors in the LUT 200X are vastly different from the logic transistors in the ALC 100X. The memory transistors have stringent requirements on leakage current, while the logic transistors have stringent requirements on drive current. To form highperformance memory transistors and highperformance logic transistors at the same time is a challenge.
The 2D integration also limits computational density and computational complexity. Computation has been developed towards higher computational density and greater computational complexity. The computational density, i.e. the computational power (e.g. the number of floatingpoint operations per second) per die area, is a figure of merit for parallel computation. The computational complexity, i.e. the total number of builtin functions supported by a processor, is a figure of merit for scientific computation. For the 2D integration, inclusion of the LUT 200X increases the die size of the conventional processor 00X and lowers its computational density. This has an adverse effect on parallel computation. Moreover, because the ALU 100X, as the primary component of the conventional processor 00X, occupies a large die area, the LUT 200X is left with only a small die area and therefore, supports few builtin functions.
The LBCbased processor 00X suffers one drawback. Because different logic circuits are used to realize different builtin functions, the processor 00X is fully customized. In other words, once its design is complete, the processor 00X can only realize a fixed set of predefined builtin functions. Apparently, configurable computation is more desirable, where a same hardware can realize different mathematical functions under the control of a set of configuration signals.
In the past, configurable logic, i.e. a same hardware realizes different logics under the control of a set of configuration signals, was realized by a configurable gate array, which is also known as fieldprogrammable gate array (FPGA), complex programmable logic device (CPLD), or other names. U.S. Pat. No. 4,870,302 issued to Freeman on Sep. 26, 1989 (hereinafter Freeman) discloses a configurable gate array. It comprises an array of configurable logic elements and a hierarchy of configurable interconnects that allow the configurable logic elements to be wired together. In the priorart configurable gate arrays, only logic functions are configurable, but mathematical functions are not configurable. A small number of mathematical functions (i.e. builtin functions) are realized in fixed computing elements, which are part of hard blocks. Namely, the circuits realizing these builtin functions are fixedly connected and are not subject to change by programming. Apparently, fixed computing elements would limit further applications of the configurable gate array. To overcome this difficulty, the present invention expands the original concept of the configurable gate array by making the fixed computing elements configurable.
It is a principle object of the present invention to realize configurable computation.
It is a further object of the present invention to realize fieldconfigurable computation.
It is a further object of the present invention to realize reconfigurable computation.
It is a further object of the present invention to realize configurable computation for complex functions.
It is a further object of the present invention to provide a configurable processor with a greater computational complexity.
It is a further object of the present invention to provide a configurable processor with a higher computational density.
It is a further object of the present invention to provide a configurable gate array with a greater computational flexibility.
In accordance with these and other objects of the present invention, the present invention discloses a configurable processor.
The present invention discloses a configurable processor with an inpackage lookup table (IPLUT), i.e. an IPLUT configurable processor. The preferred IPLUT configurable processor comprises a plurality of configurable computing elements. Each configurable computing element comprises at least a programmable memory array on a memory die and at least an arithmetic logic circuit (ALC) on a logic die. The programmable memory array stores at least a portion of a lookup table (LUT) for a mathematical function, which includes numerical values related to said mathematical function (e.g. functional values and/or derivative values thereof), while the ALC performs arithmetic operations on selected data from the LUT. In general, the logic die comprises the ALCs of a plurality of configurable computing elements, while the memory die comprises the programmable memory arrays of another plurality of configurable computing elements. The logic die and memory die are located in a configurable computingarray package and communicatively coupled by a plurality of interdie connections. Located in the configurable computingarray package, the LUT is referred to as inpackage LUT (IPLUT).
The preferred IPLUT configurable processor uses memorybased computation (MBC), which realizes mathematical functions primarily with the LUT. Compared with the LUT used by the conventional processor, the IPLUT used by the preferred IPLUT configurable processor has a much larger capacity. Although arithmetic operations are still performed, the MBC only needs to calculate a polynomial to a much lower order because it uses a much larger IPLUT as a starting point for computation. For the MBC, the fraction of computation done by the IPLUT is more than the ALC.
Each usage cycle of the IPLUT configurable processor comprises two stages: a configuration stage and a computation stage. In the configuration stage, the LUT for a desired mathematical function is written into the programmable memory array. In the computation stage, selected values of the mathematical function are read out from the programmable memory array. The IPLUT configurable processor can realize fieldconfigurable computation and reconfigurable computation. For the fieldconfigurable computation, a mathematical function is realized by writing its LUT into the programmable memory array in the field of use. For reconfigurable computation, the programmable memory array is reprogrammable and different mathematical functions can be realized by writing different LUTs for different mathematical functions thereto during different usage cycles. For example, during a first usage cycle, a first LUT for a first mathematical function is written into the reprogrammable memory array; during a second usage cycle, a second LUT for a second mathematical function is written into the reprogrammable memory array.
Because the logic die and the memory die are located in a same package, this type of vertical integration is referred to as 2.5D integration. The 2.5D integration has a profound effect on the computational density and computational complexity. For the conventional 2D integration, the footprint of a conventional processor 00X is roughly equal to the sum of those of the ALU 100X and the LUT 200X. On the other hand, because the 2.5D integration moves the LUT from aside to above, the IPLUT configurable processor becomes smaller and computationally more powerful. In addition, the total LUT capacity of the conventional processor 00X is less than 100 Kb, whereas the total IPLUT capacity for the IPLUT configurable processor could reach 100 Gb. Consequently, a single IPLUT configurable processor could support as many as 10,000 builtin functions (including various types of complex functions), far more than the conventional processor 00X. Furthermore, because the logic die and the memory die are separate dice, the logic transistors in the logic die and the memory transistors in the memory die are formed on separate semiconductor substrates. Consequently, their manufacturing processes can be individually optimized.
To further improve configurability, the present invention further discloses a preferred IPLUT configurable computing array for implementing complex functions. It is a special type of the IPLUT configurable processor and comprises an array of configurable computing elements, an array of configurable logic elements and a plurality of configurable interconnects. Each configurable computing element comprises at least a programmable memory array for storing the LUT for a mathematical function and at least an ALC for performing arithmetic operations on selected data from the LUT. The configurable logic elements and configurable interconnects in the IPLUT configurable computing array are similar to those in the conventional configurable gate array. During computation, a complex function is first decomposed into a combination of basic functions. Each basic function is then realized by an associated configurable computing element. Finally, the complex function is realized by programming the corresponding configurable logic elements and configurable interconnects.
Accordingly, the present invention discloses a configurable processor including a plurality of configurable computing elements, each of said configurable computing elements comprising: at least a programmable memory array on a memory level for storing at least a portion of a lookup table (LUT) for a mathematical function; at least an arithmetic logic circuit (ALC) on a logic level for performing at least an arithmetic operation on selected data from said LUT, wherein said logic level is a different physical level than said memory level; and means for communicatively coupling said programmable memory array and said ALC; wherein said mathematical function includes more operations than arithmetic operations performable by said ALC.
The present invention further discloses another configurable processor for implementing a mathematical function, comprising: at least first and second programmable memory arrays on a memory level, wherein said first programmable memory array stores at least a first portion of a first lookup table (LUT) for a first mathematical function; and, said second programmable memory array stores at least a second portion of a second LUT for a second mathematical function; at least an arithmetic logic circuit (ALC) on a logic level for performing at least an arithmetic operation on selected data from said first or second LUT, wherein said logic level is a different physical level than said memory level; and means for communicatively coupling said first or second programmable memory array with said ALC; wherein said mathematical function is a combination of at least said first and second mathematical functions; and, each of said first and second mathematical functions includes more operations than arithmetic operations performable by said ALC.
The present invention further discloses a configurable computing array for implementing a mathematical function, comprising: at least an array of configurable logic elements including a configurable logic element, wherein said configurable logic element selectively realizes a logic function in a logic library; at least an array of configurable computing elements comprising at least a first programmable memory array, a second programmable memory array and an arithmetic logic circuit (ALC), wherein said first programmable memory array stores at least a first portion of a first lookup table (LUT) for a first mathematical function; said second programmable memory array stores at least a second portion of a second LUT for a second mathematical function; and, said ALC performs at least an arithmetic operation on selected data from said first or second LUT; means for communicatively coupling said configurable logic elements and said configurable computing elements; whereby said configurable computing array realizes said mathematical function by programming said configurable logic elements and said configurable computing elements, wherein said mathematical function is a combination of at least said first and second mathematical functions; wherein each of said first and second mathematical functions includes more operations than arithmetic operations included in said logic library; and, each of said first and second mathematical functions includes more operations than arithmetic operations performable by said ALC.
It should be noted that all the drawings are schematic and not drawn to scale. Relative dimensions and proportions of parts of the device structures in the figures have been shown exaggerated or reduced in size for the sake of clarity and convenience in the drawings. The same reference symbols are generally used to refer to corresponding or similar features in the different embodiments.
Throughout this specification, the phrase “mathematical functions” refer to nonarithmetic functions only; the phrase “memory” is used in its broadest sense to mean any semiconductorbased holding place for information, either permanent or temporary; the phrase “permanent” is used in its broadest sense to mean any longterm storage; the phrase “communicatively coupled” is used in its broadest sense to mean any coupling whereby information may be passed from one element to another element; the term “LUT” (or, “IPLUT”) could refer to the logic lookup table (LUT) stored in the programmable memory array(s), or the physical LUT circuit in the form of the programmable memory array(s), depending on the context; the symbol “/” means a relationship of “and” or “or”.
Those of ordinary skills in the art will realize that the following description of the present invention is illustrative only and is not intended to be in any way limiting. Other embodiments of the invention will readily suggest themselves to such skilled persons from an examination of the within disclosure.
Referring now to
The configurable computing element 300i comprises at least a programmable memory array 170 and an arithmetic logic circuit (ALC) 180, which are communicatively coupled by connections 160 (
Each usage cycle of the IPLUT configurable processor 300 comprises two stages: a configuration stage and a computation stage. In the configuration stage, the LUT for a desired mathematical function is written into the programmable memory array 170. In the computation stage, selected values of the mathematical function are read out from the programmable memory array 170. The IPLUT configurable processor 300 can be used to realize fieldconfigurable computation and reconfigurable computation. For the fieldconfigurable computation, a mathematical function is realized by writing its LUT into the programmable memory array 170 in the field of use. For reconfigurable computation, the programmable memory array 170 is reprogrammable and different mathematical functions can be realized by writing different LUTs for different mathematical functions into the reprogrammable memory array 170. For example, during a first usage cycle, a first LUT for a first mathematical function is written into the reprogrammable memory array 170; during a second usage cycle, a second LUT for a second mathematical function is written into the reprogrammable memory array 170.
In the preferred configurable computing element 300i, the ALC 180 is formed on a logic die 100, while the programmable memory array 170 is formed on the memory die 200 (
The IPLUT configurable processor 300 uses memorybased computation (MBC), which realizes mathematical functions primarily with the LUT. Compared with the LUT 200X used by the conventional processor 00X, the IPLUT 170 used by the IPLUT configurable processor 300 has a much larger capacity. Although arithmetic operations are still performed, the MBC only needs to calculate a polynomial to a much lower order because it uses a much larger IPLUT 170 as a starting point for computation. For the MBC, the fraction of computation done by the IPLUT 170 is more than the ALC 180.
Referring now to
The IPLUT configurable processor package 300 in
The IPLUT configurable processor package 300 in
Because the logic die 100 and the memory die 200 are located in a same package, this type of vertical integration is referred to as 2.5D integration. The 2.5D integration has a profound effect on the computational density and computational complexity. For the conventional 2D integration, the footprint of a conventional processor 00X is roughly equal to the sum of those of the ALU 100X and the LUT 200X. On the other hand, because the 2.5D integration moves the LUT from aside to above, the IPLUT configurable processor 300 becomes smaller and computationally more powerful. In addition, the total LUT capacity of the conventional processor 00X is less than 100 Kb, whereas the total IPLUT capacity for the IPLUT configurable processor 300 could reach 100 Gb. Consequently, a single IPLUT configurable processor 300 could support as many as 10,000 builtin functions (including various types of complex functions), far more than the conventional processor 00X. Moreover, the 2.5D integration can improve the communication throughput between the IPLUT 170 and the ALC 180. Because they are physically close and coupled by a large number of interdie connections 160, the IPLUT 170 and the ALC 180 have a larger communication throughput than that between the LUT 200X and the ALU 100X in the conventional processor 00X. Lastly, the 2.5D integration benefits manufacturing process. Because the logic die 100 and the memory die 200 are separate dice, the logic transistors in the logic die 100 and the memory transistors in the memory die 200 are formed on separate semiconductor substrates. Consequently, their manufacturing processes can be individually optimized.
Referring now to
When realizing a mathematical function, combining the LUT with polynomial interpolation can achieve a high precision without using an excessively large LUT. For example, if only LUT (without any polynomial interpolation) is used to realize a singleprecision function (32bit input and 32bit output), it would have a capacity of 2^{32}*32=128 Gb. By including polynomial interpolation, significantly smaller LUTs can be used. In the above embodiment, a singleprecision function can be realized using a total of 4 Mb LUT (2 Mb for the functional values, and 2 Mb for the firstorder derivative values) in conjunction with a firstorder Taylor series. This is significantly less than the LUTonly approach (4 Mb vs. 128 Gb).
Besides transcendental functions, the preferred embodiment of
Referring now to
The configurable computing elements 300AA300BD are similar to those in the IPLUT configurable processor 300 (
The first preferred IPLUT configurable computing array 700 can realize a complex function by programming the configurable logic elements 400AA400BD and the configurable computing elements 300AA300BD. The complex function is a combination of basic functions, which can be implemented by selected configurable computing elements. The mathematical operations included in each basic function are not only more than the arithmetic operations included in the logic library of the configurable logic elements 400AA400BD, but also more than the arithmetic operations performable by the ALC 180. In general, the arithmetic operations included in the logic library consist of addition and subtraction; and, the arithmetic operations performable by the ALC 180 consist of addition, subtraction and multiplication.
In one preferred IPLUT configurable computing array 700, the programmable memory arrays 170 of the configurable computing elements 300AA300BD are located on a different physical level than the configurable logic elements 400AA400BD. For example, the programmable memory arrays 170 are located on a memory die 200, while the configurable logic elements 400 are located on a logic die. This logic die could be the same logic die 100 for the ALC 180, as in the case of
The first preferred IPLUT configurable computing array 700 is particularly suitable for realizing complex functions. If only LUT is used to realize the above 4variable function, i.e. e=a·sin(b)+c·cos(d), an enormous LUT is needed: 2^{16}*2^{16}*2^{16}*2^{16}*16=256 Eb even for half precision, which is impractical. Using the IPLUT configurable gate array 700, only 8 Mb LUT (including 8 configurable computing elements, each with 1 Mb capacity) is needed to realize a 4variable function. To those skilled in the art, the first preferred IPLUT configurable computing array 700 can be used to realize other complex functions.
Referring now to
While illustrative embodiments have been shown and described, it would be apparent to those skilled in the art that many more modifications than that have been mentioned above are possible without departing from the inventive concepts set forth therein. For example, the IPLUT configurable processor of the present invention could be a microcontroller, a controller, a central processing unit (CPU), a digital signal processor (DSP), a graphic processing unit (GPU), a networksecurity processor, an encryption/decryption processor, an encoding/decoding processor, a neuralnetwork processor, or an artificial intelligence (Al) processor. These IPLUT configurable processors can be found in consumer electronic devices (e.g. personal computers, video game machines, smart phones) as well as engineering and scientific workstations and server machines. The invention, therefore, is not to be limited except in the spirit of the appended claims.