Configurable computing array for implementing complex math functions

0Associated
Cases 
0Associated
Defendants 
0Accused
Products 
0Forward
Citations 
0
Petitions 
0
Assignments
First Claim
1. A configurable computing array, comprising:
 at least an array of configurable logic elements including a configurable logic element, wherein said configurable logic element selectively realizes a logic function from a logic library; and
at least an array of configurable computing elements including first and second configurable computing elements, wherein said first configurable computing element comprises a first memory for storing a first lookup table (LUT) for a first math function; and
, said second configurable computing element comprises a second memory for storing a second LUT for a second math function;
whereby said configurable computing array realizes a complex math function by programming said configurable logic elements and said configurable computing elements, wherein said complex math function is a combination of at least said first and second math functions.
0 Assignments
0 Petitions
Accused Products
Abstract
To implement a complex math function, a configurable computing array comprises at least an array of configurable interconnects, an array of configurable logic elements and an array of configurable computing elements. Each configurable computing element comprises at least a memory for storing a lookup table (LUT) for a math function.
10 Citations
No References
Condensed Galois field computing system  
Patent #
US 7,512,647 B2
Filed 11/22/2004

Current Assignee
Analog Devices Inc.

Sponsoring Entity
Analog Devices Inc.

Arithmetic unit for approximating function  
Patent #
US 7,472,149 B2
Filed 08/25/2004

Current Assignee
Toshiba Corporation

Sponsoring Entity
Toshiba Corporation

Method of generating sine/cosine function and apparatus using the same for use in digital signal processor  
Patent #
US 5,954,787 A
Filed 12/01/1997

Current Assignee
WiLAN Inc.

Sponsoring Entity
Daewoo Electronics

Threedimensional readonly memory  
Patent #
US 5,835,396 A
Filed 10/17/1996

Current Assignee
Guobiao Zhang

Sponsoring Entity
Guobiao Zhang

Method and apparatus for performing division using a rectangular aspect ratio multiplier  
Patent #
US 5,046,038 A
Filed 08/02/1989

Current Assignee
VIACyrix Inc.

Sponsoring Entity
Cyrix Corp

Configurable electrical circuit having configurable logic elements and configurable interconnects  
Patent #
US 4,870,302 A
Filed 02/19/1988

Current Assignee
Xilinx Inc.

Sponsoring Entity
Xilinx Inc.

Large bitpercell threedimensional maskprogrammable readonly memory  
Patent #
US 8,564,070 B2
Filed 05/24/2010

Current Assignee
Guobiao Zhang, ChengDu HaiCun IP Technology LLC

Sponsoring Entity
Guobiao Zhang, ChengDu HaiCun IP Technology LLC

Digital signal processor having instruction set with an x;function using reduced lookup table  
Patent #
US 9,207,910 B2
Filed 01/30/2009

Current Assignee
LSI Corporation

Sponsoring Entity
Intel Corporation

Nonlinear modeling of a physical system using lookup table with polynomial interpolation  
Patent #
US 9,225,501 B2
Filed 03/31/2014

Current Assignee
Intel Corporation

Sponsoring Entity
Intel Corporation

Configurable computing array using twosided integration  
Patent #
US 10,141,939 B2
Filed 10/25/2017

Current Assignee
ChengDu HaiCun IP Technology LLC

Sponsoring Entity
Guobiao Zhang, ChengDu HaiCun IP Technology LLC

20 Claims
 1. A configurable computing array, comprising:
at least an array of configurable logic elements including a configurable logic element, wherein said configurable logic element selectively realizes a logic function from a logic library; and at least an array of configurable computing elements including first and second configurable computing elements, wherein said first configurable computing element comprises a first memory for storing a first lookup table (LUT) for a first math function; and
, said second configurable computing element comprises a second memory for storing a second LUT for a second math function;whereby said configurable computing array realizes a complex math function by programming said configurable logic elements and said configurable computing elements, wherein said complex math function is a combination of at least said first and second math functions.  View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
 11. A configurable computing array, comprising:
at least an array of configurable interconnects including a configurable interconnect, wherein said configurable interconnect selectively realizes an interconnect from an interconnect library; at least an array of configurable logic elements including a configurable logic element, wherein said configurable logic element selectively realizes a logic function from a logic library; and at least an array of configurable computing elements including first and second configurable computing elements, wherein said first configurable computing element comprises a first memory for storing a first lookup table (LUT) for a first math function; and
, said second configurable computing element comprises a second memory for storing a second LUT for a second math function;whereby said configurable computing array realizes a complex math function by programming said configurable interconnects, said configurable logic elements and said configurable computing elements, wherein said complex math function is a combination of at least said first and second math functions.  View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
1 Specification
This application is a continuationinpart of U.S. patent application Ser. No. 15/793,912, filed Oct. 25, 2017, which is a continuation of U.S. patent application Ser. No. 15/450,049, filed Mar. 6, 2017, now U.S. Pat. No. 9,838,031, issued Dec. 5, 2017, which is a continuationinpart of U.S. patent application Ser. No. 15/450,017, filed Mar. 5, 2017, now U.S. Pat. No. 9,948,306, issued Apr. 17, 2018.
This application is also a continuationinpart of U.S. patent application Ser. No. 15/793,968, filed Oct. 25, 2017, which is a continuationinpart of U.S. patent application Ser. No. 15/450,049, filed Mar. 6, 2017, now U.S. Pat. No. 9,838,031, issued Dec. 5, 2017, which is a continuationinpart of U.S. patent application Ser. No. 15/450,017, filed Mar. 5, 2017, now U.S. Pat. No. 9,948,306, issued Apr. 17, 2018.
This application is also a continuationinpart of U.S. patent application Ser. No. 15/793,927, filed Oct. 25, 2017, which is a continuationinpart of U.S. patent application Ser. No. 15/450,049, filed Mar. 6, 2017, now U.S. Pat. No. 9,838,031, issued Dec. 5, 2017, which is a continuationinpart of U.S. patent application Ser. No. 15/450,017, filed Mar. 5, 2017, now U.S. Pat. No. 9,948,306, issued Apr. 17, 2018.
This application is also a continuationinpart of U.S. patent application Ser. No. 15/793,933, filed Oct. 25, 2017, which is a continuationinpart of U.S. patent application Ser. No. 15/450,049, filed Mar. 6, 2017, now U.S. Pat. No. 9,838,031, issued Dec. 5, 2017, which is a continuationinpart of U.S. patent application Ser. No. 15/450,017, filed Mar. 5, 2017, now U.S. Pat. No. 9,948,306, issued Apr. 17, 2018.
These patent applications claim priorities from Chinese Patent Application No. 201610125227.8, filed Mar. 5, 2016; Chinese Patent Application No. 201610307102.7, filed May 10, 2016; Chinese Patent Application No. 201710996864.7, filed Oct. 19, 2017; Chinese Patent Application No. 201710998652.2, filed Oct. 20, 2017; Chinese Patent Application No. 201710980817.3, filed Oct. 20, 2017; Chinese Patent Application No. 201710980779.1, filed Oct. 20, 2016; Chinese Patent Application No. 201710980813.5, filed Oct. 20, 2016; Chinese Patent Application No. 201710980826.2, filed Oct. 20, 2016; Chinese Patent Application No. 201710980967.4, filed Oct. 20, 2016; Chinese Patent Application No. 201710981043.6, filed Oct. 20, 2016; Chinese Patent Application No. 201710980989.0, filed Oct. 20, 2016, in the State Intellectual Property Office of the People'"'"'s Republic of China (CN), the disclosure of which are incorporated herein by reference in their entireties.
The present invention relates to the field of integrated circuit, and more particularly to configurable gate array.
Complex math functions are widely used in various applications. As used hereinafter, a complex math function is a math function with multiple independent variables (independent variable is also known as input variable or argument) and can be expressed as a combination of basic math functions. On the other hand, a basic math function is a math function with a single (or, few) independent variable. Exemplary basic math functions include transcendental functions, such as exponential function (exp), logarithmic function (log), trigonometric functions (sin, cos, tan, atan) and others.
On a conventional processor, a small number of basic math functions are calculated by hardware (i.e. hardware computing). These basic math functions are referred to as builtin functions. The conventional hardware computing primarily uses logicbased computing, i.e. logic circuits (e.g. adders, multipliers) are primarily used to implement math functions. Because different math functions are implemented by different logic circuits, the hardware implementation of builtin functions is highly customized. Due to limited resources on a processor die, only a small number of builtin functions can be implemented by hardware. For example, only 7 builtin functions (i.e. CBRT, EXP, LN, SIN, COS, TAN, ATAN) are implemented by hardware on an Intel IA64 processor (referring to Harrison et al. “The Computation of Transcendental Functions on the IA64 Architecture”, Intel Technology Journal, Q4, 1999, page 6).
Because hardware implementation of even basic math functions (e.g. transcendental functions) is difficult, software computing has been a commonly accepted practice. On a conventional processor, all complex math functions, even most basic math functions, are calculated by software. As software computing is more complex than hardware computing, calculation of complex math functions is slow and inefficient. It is highly desired to realize hardware computing for complex math functions. It is even more desirable to realize configurable hardware computing, i.e. to use a same set of hardware to implement a large set of complex math functions.
A configurable gate array is a semicustom integrated circuit designed to be configured by a customer after manufacturing. It is also referred to as fieldprogrammable gate array (FPGA), complex programmable logic device (CPLD), or other names. U.S. Pat. No. 4,870,302 issued to Freeman on Sep. 26, 1989 (hereinafter referred to as Freeman) discloses a configurable gate array. It contains an array of configurable logic elements (also known as configurable logic blocks) and a hierarchy of configurable interconnects (also known as programmable interconnects) that allow the configurable logic elements to be wired together. Each configurable logic element in the array is in itself capable of realizing any one of a plurality of logic functions (e.g. shift, logic NOT, logic AND, logic OR, logic NOR, logic NAND, logic XOR, arithmetic addition “+”, arithmetic subtraction “−”, etc.) depending upon a first configuration signal. Each configurable interconnect can selectively couple or decouple interconnect lines depending upon a second configuration signal.
In conventional configurable gate array, fixed computing elements are used to implement basic math functions. These fixed computing elements are portions of hard blocks and not configurable, i.e. the circuits implementing these math functions are fixedly connected and are not subject to change by programming. This would limit further application of the configurable gate array. To overcome these difficulties, the present invention expands the original concept of the configurable gate array by making the fixed computing elements configurable. In other words, besides configurable logic elements, the configurable gate array comprises configurable computing elements, which can realize any one of a plurality of math functions.
It is a principle object of the present invention to extend the applications of a configurable gate array to the field of complex math computation.
It is a further object of the present invention to provide a configurable computing array to customize not only logic functions, but also math functions.
It is a further object of the present invention to provide a configurable computing array with a small physical size and a fast computational speed.
It is a further object of the present invention to provide a configurable computing array with a short timetomarket and good manufacturability.
In accordance with these and other objects of the present invention, the present invention discloses a configurable computing array for realizing complex math functions.
The present invention discloses a configurable computing array for realizing complex math functions. It comprises at least an array of configurable logic elements and at least an array of configurable computing elements. Each configurable computing element comprises at least a memory, which is preferably programmable and can be loaded with a lookup table (LUT) for a math function. Because the memory is programmable, the math functions that can be realized by the configurable computing element are essentially boundless and numerous.
The usage cycle of the configurable computing element comprises two stages: a configuration stage and a computation stage. In the configuration stage, the LUT for a desired math function is loaded into the memory. In the computation stage, a selected portion of the LUT for the desired math function is read out from the memory. For a rewritable memory, a configurable computing element can be reconfigured to realize different math functions at different time.
Besides configurable computing elements, the preferred configurable computing array further comprises configurable logic elements and configurable interconnects. During operation, a complex math function is first decomposed into a combination of basic math functions. Each basic math function is realized by programming an associated configurable computing element. The complex math function is then realized by programming the appropriate configurable logic elements and configurable interconnects.
By using arrays of configurable computing elements, configurable logic elements and configurable interconnects, the present invention implements hardware computing of complex math functions. Compared with software computing, hardware computing is much faster and more efficient. Moreover, the hardware computing disclosed the present invention is a type of memorybased computing, i.e. the LUTs are used as a primary means to implement math functions. The best advantage of the memorybased computing over the logicbased computing is its configurability and generality. By loading the values of different math functions into an LUT at different time, a single LUT can be used to implement a large set of math functions, thus realizing configurable computing.
Accordingly, the present invention discloses a configurable computing array, comprising: at least an array of configurable logic elements including a configurable logic element, wherein said configurable logic element selectively realizes a logic function from a logic library; and at least an array of configurable computing elements including first and second configurable computing elements, wherein said first configurable computing element comprises a first memory for storing a first lookup table (LUT) for a first math function; and, said second configurable computing element comprises a second memory for storing a second LUT for a second math function; whereby said configurable computing array realizes a complex math function by programming said configurable logic elements and said configurable computing elements, wherein said complex math function is a combination of at least said first and second math functions.
The present invention further discloses another configurable computing array, comprising: at least an array of configurable interconnects including a configurable interconnect, wherein said configurable interconnect selectively realizes an interconnect from an interconnect library; at least an array of configurable logic elements including a configurable logic element, wherein said configurable logic element selectively realizes a logic function from a logic library; and at least an array of configurable computing elements including first and second configurable computing elements, wherein said first configurable computing element comprises a first memory for storing a first lookup table (LUT) for a first math function; and, said second configurable computing element comprises a second memory for storing a second LUT for a second math function; whereby said configurable computing array realizes a complex math function by programming said configurable interconnects, said configurable logic elements and said configurable computing elements, wherein said complex math function is a combination of at least said first and second math functions.
It should be noted that all the drawings are schematic and not drawn to scale. Relative dimensions and proportions of parts of the device structures in the figures have been shown exaggerated or reduced in size for the sake of clarity and convenience in the drawings. The same reference symbols are generally used to refer to corresponding or similar features in the different embodiments. In the present invention, the terms “write”, “program” and “configure” have similar meanings and are used interchangeably. The symbol “/” means a relationship of “and” or “or”.
Those of ordinary skills in the art will realize that the following description of the present invention is illustrative only and is not intended to be in any way limiting. Other embodiments of the invention will readily suggest themselves to such skilled persons from an examination of the within disclosure.
Referring now to
The LUT in the configurable computing element 100 stores numerical values related to a math function. This is different from the conventional configurable gate array where the LUT in a configurable logic element stores logic values of a logic function. The implementation of math functions is much more complex than that of logic functions. Numerical values are denoted by a large number of bits. For example, a halfprecision floatingpoint number comprises 16 bits; a singleprecision floatingpoint number comprises 32 bits; a doubleprecision floatingpoint number comprises 64 bits. In comparison, the logic values can be denoted by a single bit and have only two values, i.e. “true” and “false”. Accordingly, the LUT size of the configurable computing element 100 is substantially larger than that of the configurable logic element.
The numerical values stored in the LUT of the configurable computing element 100 include at least the functional values of a math function. When the input variable of a math function comprises a larger number of bits, the LUT size could become excessively large. For example, an LUT to store the functional values of a doubleprecision math function needs 2^{64}*64=10^{21 }bits. To reduce the LUT size, Taylorseries (or other polynomial expansion) calculation is preferably used. To be more specific, the LUT not only sores the functional values, but also the derivative values of a math function, e.g. the firstorder derivative values, the secondorder derivative values. To perform Taylorseries calculation, the configurable computing element 100 further comprises at least an adder and a multiplier. More details on Taylorseries implementation of math functions are disclosed in a copending U.S. patent application Ser. No. 15/487,366, filed Apr. 13, 2017.
Referring now to
Referring now to
Referring now to
Referring now to
The preferred configurable computing array 400 can be constructed in many ways. In one preferred embodiment, the preferred configurable computing array 400 is a singlelevel configurable computing array, wherein the configurable computing elements 100 and the configurable logic elements 200 are disposed on a same physical level. To be more specific, all active elements of the preferred configurable computing array 400 (including the memory cells of the memory array 110 in the configurable computing elements 100 and the transistors in the configurable logic elements 200) are formed on the front surface of a same semiconductor substrate and placed sidebyside. Because all active elements are disposed on a 2D plane, this type of integration is referred to 2D integration; and, the singlelevel configurable computing array is also referred to as 2D integrated configurable computing array.
In another preferred embodiment, the preferred configurable computing array 400 is a multilevel configurable computing array, wherein the configurable computing elements 100 and the configurable logic elements 200 are disposed on different physical levels. To be more specific, the memory cells of the configurable computing elements 100 are disposed on at least a memory level, the transistors of the configurable logic elements 200 are disposed on at least a logic level, wherein the memory level is disposed above (or, below) the logic level. In one preferred example, both the memory cells and the transistors are disposed on the same side of a same semiconductor substrate, but the memory cells are stacked above the transistors (
Comparing with the singlelevel configurable computing array, the multilevel configurable computing array offers many advantages. First of all, because the memory cells are disposed on a separate memory level(s), the memory level(s) can be dedicated to the LUT storage. As a result, the memory level(s) has a large storage density and therefore, can be used to store a large LUT (for better precision) or more LUTs (for more math functions). Secondly, because they are formed on a separate logic level, the configurable logic elements would have a small footprint. This leads to smaller die size. Thirdly, because the configurable computing elements are disposed above (or, below) the configurable logic elements, the connections between them are relatively short. This leads to a fast speed.
Referring now to
Based on the orientation of the memory cells, the 3DM can be categorized into horizontal 3DM (3DM_{H}) and vertical 3DM (3DM_{V}). In a 3DM_{H}, all address lines are horizontal and the memory cells form a plurality of horizontal memory levels which are vertically stacked above each other. A wellknown 3DM_{H }is 3DXPoint. In a 3DM_{V}, at least one set of the address lines are vertical and the memory cells form a plurality of vertical memory strings which are placed sidebyside on/above the substrate. A wellknown 3DM_{V }is 3DNAND. In general, the 3DM_{H }(e.g. 3DXPoint) is faster, while the 3DM_{V }(e.g. 3DNAND) is denser.
The preferred 3DM in
The 3DM cell 1aa comprises a programmable layer 12 and a diode layer 14. The programmable layer 12 could be an OTP layer (e.g. an antifuse layer, used for the 3DOTP) or an MTP layer (e.g. a phasechange layer, used for the 3DMTP). The diode layer 14 (also referred to as selector layer, a quasiconduction layer or other names) is broadly interpreted as any layer whose resistance at the read voltage is substantially lower than the case when the applied voltage has a magnitude smaller than or polarity opposite to that of the read voltage. The diode could be a semiconductor diode (e.g. pin silicon diode), or a metaloxide (e.g. TiO_{2}) diode. In some embodiments, the programmable layer 12 and the diode layer 14 are merged into a single layer.
The preferred 3DM_{V }array in
The preferred 3DM_{V }array in
In the preferred embodiments of
Referring now to
This type of integration, i.e. forming the configurable logic elements 100AA100BB and the configurable computing elements 200AA200BB on different sides of the substrate, is referred to as twosided integration. The twosided integration can improve computational density and computational complexity. With the conventional 2D integration, the die size of configurable computing array is the sum of those of the configurable computing elements and the configurable logic elements. With the twosided integration, the configurable computing elements are moved from aside to the other side. This leads to a smaller die size and a higher computational density. In addition, because the memory transistors in the configurable computing elements and the logic transistors in the configurable logic elements are formed on different sides of the substrate, their manufacturing processes can be optimized separately.
Referring now to
The configurable computingarray package 400 in
The configurable computingarray package 400 in
Although their active elements are disposed in a 3D space, the configurable computing die 100W and the configurable logic die 200W are separate dice. Accordingly, this type of integration is generally referred to as 2.5D integration. The 2.5D integration excels the conventional 2D integration (i.e. singlelevel configurable computing array) in many aspects. First of all, the footprint of a conventional 2D integrated configurable computing array is roughly equal to the sum of those of the configurable computing elements, the configurable logic elements and the configurable interconnects. On the other hand, because the 2.5D integration moves the configurable computing elements from aside to above, the configurable computingarray package 400 becomes smaller and computationally more powerful. Secondly, because they are physically close and coupled by a large number of interdie connections 160, the configurable computing die 100W and the configurable logic die 200W have a larger communication bandwidth than the conventional 2D integrated configurable computing array. Thirdly, the 2.5D integration benefits manufacturing process. Because the configurable computing die 100W and the configurable logic die 200W are separate dice, the memory transistors in the configurable computing die 100W and the logic transistors in the configurable logic die 200W are formed on separate semiconductor substrates. Consequently, their manufacturing processes can be individually optimized.
The preferred embodiments of the present invention are fieldprogrammable computingarray (FPCA) package. For an FPCA package, all manufacturing processes of the configurable computing die and the configurable logic die are finished in factory. The function of the FPCA package can be electrically defined in the field of use. The concept of FPCA package can be extended to maskprogrammed computingarray (MPCA) package. For a MPCA package, the wafers containing the configurable computing elements and/or the wafer containing the configurable logic elements are prefabricated and stockpiled. However, certain interconnects on these wafers are not fabricated until the function of the MPCA package is finally defined.
While illustrative embodiments have been shown and described, it would be apparent to those skilled in the art that many more modifications than that have been mentioned above are possible without departing from the inventive concepts set forth therein. The invention, therefore, is not to be limited except in the spirit of the appended claims.