Circuit for the inner or scalar product computation in Galois fields

0Associated
Cases 
0Associated
Defendants 
0Accused
Products 
8Forward
Citations 
0
Petitions 
1
Assignment
First Claim
1. A circuit for use in a cryptosystem for computing a scalar product of a plurality of vectors in a finite Galois field identified by a generator polynomial, each vector including at least a first and a second element belonging to the finite Galois field, the circuit comprising:
 at least one input register for storing a plurality of digital signals representative of the first element of each of the plurality of vectors;
at least one lookup table for storing a plurality of digital words representing a plurality of combinations and reductions of the first and second elements of the plurality of vectors, each of the plurality of digital words being a function of the second elements of the plurality of vectors and the generator polynomial, the at least one lookup table cooperating with said at least one input register partial product results each identified by at least one of the plurality of digital words stored in said at least one lookup table and based upon the plurality of digital signals stored in said at least one input register; and
an accumulator unit for adding the partial product results to give the scalar product of the plurality of vectors in the finite Galois field identified by the generator polynomial of the cryptosystem based upon an accumulation of the partial product results.
1 Assignment
0 Petitions
Accused Products
Abstract
A circuit for computing the inner of scalar product of two vectors in a finite Galois field defined by a generator polynomial, wherein each vector includes at least two elements belonging to said finite field, comprises one or more lookup tables storing digital words indicative of said possible combinations and said possible reductions. The digital words in question are defined as a function of the second elements of said vectors and the generator polynomial of the field. The input register(s) and the lookup table(s) are configured to cooperate in a plurality of subsequent steps to generate at each step a partial product result identified by at least one of digital word addressed in a corresponding lookup table as a function of the digital signals stored in the input register(s). The circuit also includes an accumulator unit for adding up the partial results generated at each step to give a final product result deriving from accumulation of said partial results.
20 Citations
View as Search Results
Accelerating elliptic curve point multiplication through batched inversions  
Patent #
US 7,702,105 B1
Filed 04/23/2004

Current Assignee
Oracle America Inc.

Sponsoring Entity
Oracle America Inc.

METHOD OF AND APPARATUS FOR THE REDUCTION OF A POLYNOMIAL IN A BINARY FINITE FIELD, IN PARTICULAR IN THE CONTEXT OF A CRYPTOGRAPHIC APPLICATION  
Patent #
US 20100061547A1
Filed 03/21/2007

Current Assignee
IHP GmbHInnovations For High Performance MicroelectronicsLeibnizInstitut Fur Innovative Mikroelektronik

Sponsoring Entity
IHP GmbHInnovations For High Performance MicroelectronicsLeibnizInstitut Fur Innovative Mikroelektronik

MODULAR SQUARING IN BINARY FIELD ARITHMETIC  
Patent #
US 20090157788A1
Filed 10/31/2008

Current Assignee
Blackberry Limited

Sponsoring Entity
Blackberry Limited

Method and integrated circuit for carrying out a multiplication modulo m  
Patent #
US 20050223052A1
Filed 05/20/2003

Current Assignee
Technische Universitat Munchen

Sponsoring Entity
Technische Universitat Munchen

SERIAL MULTIPLY ACCUMULATOR FOR GALOIS FIELD  
Patent #
US 20150277857A1
Filed 03/28/2014

Current Assignee
StorArt Technology Shenzhen Co. Ltd.

Sponsoring Entity
StorArt Technology Shenzhen Co. Ltd.

Serial multiply accumulator for galois field  
Patent #
US 9,417,848 B2
Filed 03/28/2014

Current Assignee
StorArt Technology Shenzhen Co. Ltd.

Sponsoring Entity
StorArt Technology Shenzhen Co. Ltd.

Processor for realizing at least two categories of functions  
Patent #
US 10,372,359 B2
Filed 05/10/2017

Current Assignee
ChengDu HaiCun IP Technology LLC

Sponsoring Entity
ChengDu HaiCun IP Technology LLC

Configurable processor with inpackage lookup table  
Patent #
US 10,445,067 B2
Filed 11/28/2018

Current Assignee
Hangzhou Haicun Information Technology Co. Ltd.

Sponsoring Entity
Hangzhou Haicun Information Technology Co. Ltd.

Shared galois field multiplier  
Patent #
US 6,701,336 B1
Filed 11/12/1999

Current Assignee
Maxtor Corporation

Sponsoring Entity
Maxtor Corporation

Galois field arithmetic processor  
Patent #
US 6,523,054 B1
Filed 11/10/1999

Current Assignee
Fujitsu Limited

Sponsoring Entity
Fujitsu Limited

Circuit for multiplication in a Galois field  
Patent #
US 6,581,084 B1
Filed 01/14/2000

Current Assignee
Stmicroelectronics SA

Sponsoring Entity
Stmicroelectronics SA

Method and apparatus for reducing power dissipation in finite field arithmetic circuits  
Patent #
US 6,662,346 B1
Filed 02/08/2002

Current Assignee
Marvell International Limited

Sponsoring Entity
Marvell International Limited

Arithmetic processor for finite field and module integer arithmetic operations  
Patent #
US 6,349,318 B1
Filed 10/14/1999

Current Assignee
Certicom Corporation

Sponsoring Entity
Certicom Corporation

Highspeed modular multiplication apparatus achieved in small circuit  
Patent #
US 6,366,940 B1
Filed 03/02/1999

Current Assignee
Matsushita Electric Industrial Company Limited

Sponsoring Entity
Matsushita Electric Industrial Company Limited

Method for multiplication in Galois fields using programmable circuits  
Patent #
US 6,377,969 B1
Filed 04/23/1999

Current Assignee
General Dynamics C4 Systems Incorporated

Sponsoring Entity
General Dynamics Government Systems Corporation

Efficient finite field multiplication in normal basis  
Patent #
US 6,389,442 B1
Filed 12/28/1998

Current Assignee
Emc IP Holding Company LLC

Sponsoring Entity
RSA Security LLC

Method and apparatus for finite field multiplication  
Patent #
US 6,049,815 A
Filed 12/24/1997

Current Assignee
Certicom Corporation

Sponsoring Entity
Certicom Corporation

Elliptic curve encryption systems  
Patent #
US 6,141,420 A
Filed 01/29/1997

Current Assignee
Certicom Corporation

Sponsoring Entity
Certicom Corporation

Modified reed solomon code selection and encoding system  
Patent #
US 5,822,336 A
Filed 11/14/1996

Current Assignee
Maxtor Corporation

Sponsoring Entity
Quantum Corporation

Highspeed realtime ReedSolomon decoder  
Patent #
US 4,873,688 A
Filed 10/05/1987

Current Assignee
Idaho Research Foundation Inc.

Sponsoring Entity
Idaho Research Foundation Inc.

21 Claims
 1. A circuit for use in a cryptosystem for computing a scalar product of a plurality of vectors in a finite Galois field identified by a generator polynomial, each vector including at least a first and a second element belonging to the finite Galois field, the circuit comprising:
at least one input register for storing a plurality of digital signals representative of the first element of each of the plurality of vectors; at least one lookup table for storing a plurality of digital words representing a plurality of combinations and reductions of the first and second elements of the plurality of vectors, each of the plurality of digital words being a function of the second elements of the plurality of vectors and the generator polynomial, the at least one lookup table cooperating with said at least one input register partial product results each identified by at least one of the plurality of digital words stored in said at least one lookup table and based upon the plurality of digital signals stored in said at least one input register; and an accumulator unit for adding the partial product results to give the scalar product of the plurality of vectors in the finite Galois field identified by the generator polynomial of the cryptosystem based upon an accumulation of the partial product results.  View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
 13. A circuit for use in a cryptosystem for computing a scalar product of a plurality of vectors in a finite Galois field identified by a generator polynomial, each vector including at least a first and a second element belonging to the finite field, the circuit comprising:
at least one input register for storing digital signals representative of the first element of each of the plurality of vectors; at least one lookup table for storing a plurality of digital words, each of the plurality of digital words based on the second elements of the plurality of vectors and the generator polynomial, the at least one lookup table cooperating with said at least one input register to generate at least two partial product results identified by the plurality of digital words stored in said at least one lookup table and based upon at least two digital signals stored in said at least one input register; and an accumulator unit for adding the at least two partial product results to give the scalar product of the plurality of vectors in the finite Galois field identified by the generator polynomial of the cryptosystem.  View Dependent Claims (14, 15, 16)
 17. A method for computing an inner product of a plurality of vectors belonging to a set identified by a generator polynomial for use in a cryptosystem, each said vector having at least a first and a second element belonging to said set, the circuit comprising:
providing at least one input register for storing digital signals representative of the first element of each of the plurality of vectors; configuring at least one lookup table to store digital words based on the second elements of each of the plurality of vectors and the generator polynomial, and to cooperate with said at least one input register to generate partial product results each identified by at least one of said digital words addressed in said at least one lookup table; and summing the partial product results in an accumulator unit to give the inner product of the plurality of vectors in the finite Galois field identified by the generator polynomial of the cryptosystem deriving from accumulation of said partial results.  View Dependent Claims (18, 19, 20, 21)
1 Specification
The present invention relates to computing systems and was developed with specific attention being paid to cryptographic systems (cryptosystems) based on the use of elliptic curves.
Elliptic Curve Cryptosystems or, briefly, ECC, appear to be particularly promising for use in smart cards where intrinsic restrictions exist in terms of silicon area and power consumption, while processing time constraints are also to be taken into account.
ECCs make it possible to reach the same level of security of RSA systems using keys of about 200 bits. Operations on elliptic curves are based on the arithmetic of finite Galois fields. Essentially, two basic operations are necessary to implement such a cryptosystem: multiplication and addition in finite fields. While addition is a simple bitwise XOR operation, multiplication is inevitably more complex.
For a general review on ECC systems, reference may be made e.g. to M. Rosing, “Implementing Elliptic Curve Cryptography”, Manning Publications, 1999; A. Menezes, “Elliptic Curve Public Key Cryptosystems”, Kluwer Academic Publ., Boston, 6th Printing, 1998; R. Lidl, H. Niederreiter, “Introduction to Finite Fields and their Applications” Cambridge Univ. Press, 1986.
Previous research work concerning practical implementation of ECCs at hardware level are based on coprocessor design. A coprocessor is essentially a sort of additional arithmeticlogical unit (ALU) adapted to implement the two basic operations of addition and multiplication.
For a general review of previous activity in that area reference can be made e.g. to M. Hasan, “Lookup TableBased Large Finite Field Multiplication in Memory Constrained Cryptosystems”, in IEEE Trans. on Comp., vol. 49, no. 7, July, 2000; G. Orlando, C. Paar, “A SuperSerial Galois Field Multiplier for FPGA'"'"'s and its Application to PublicKey Algorithms”, 7th Annual IEEE Symp. on FieldProgr. Custom Computing Machines, 1999, Page(s): 232–239; C. Paar, “Implementation Options for Finite Fields Arithmetic for Elliptic Curve Cryptosystems”, Proc. 3rd Workshop on Elliptic Curve Cryptosystems, ECC '"'"'99, Waterloo, Ontario, Canada, November, 1999; L. Song, K. K. Parhi, I. Kuroda, T. Nishitani, “LowEnergy Programmable Finite Field Data Path Architectures”, Proc. ISCAS '"'"'98, Vol. 2, 1998, Page(s): 406–409; A. G. Wassal, M. A. Hasan, M. I. Elmasry, “LowPower Design of Finite Field Multipliers for Wireless Applications”, Proc. 8th Great Lakes Symposium on VLSI, 1998, Page(s): 19–25; H. Wu, M. A. Hasan, “Low Complexity BitParallel Multipliers for a Class of Finite Fields”, in IEEE Trans. on Comp., Vol. 478, August, 1998, Page(s): 883–887; L. Song, K. K. Parhi, “Efficient Finite Field Serial/Parallel Multiplication”, Proc. ASAP '"'"'96, 1996, Page(s): 72–82; M. Furer, K. Mehlhorn, “AT^{2 }Optimal Galois Field Multiplier for VLSI”, in IEEE Trans. on Comp., Vol. 389, September 1989, Page(s): 1333–1336.
While satisfactory from a general viewpoint, most prior art solutions still extensively suffer from inherent disadvantages in terms of circuit complexity, power consumption and computational speed. This last cited point is particularly significant as regards the socalled kP operation on elliptic curves, which in fact represents the kernel of any ECC cryptosystem.
The object of the invention is thus to provide a new improved solution which overcomes the intrinsic disadvantages of the prior art.
According to the invention, such an object and other additional objects are achieved by means of a process and system having the features set forth in the claims which follow.
Essentially, the basic idea underlying the invention is to perform in a single step two standard multiplication operations and the addition of the two results so obtained, instead of using twice a standard multiplier and eventually an adder. In fact, using elliptic curves in cryptography requires two operations to be carried out on the points of the curve: addition of two points and doubling of a point. Both require some basic operations in Galois fields, like addition, multiplication, and—possibly—division and squaring. In the design phase some basic choices must therefore be made such as e.g. the choice of the basis of the elements in the fields (polynomial basis, normal basis, dual basis, triangular basis or “ghost bit” basis) and the choice of the coordinates for representing the elliptic curve (affine, homogeneous, Jacobian, etc.).
In fact, by using homogeneous (projective) coordinates to describe the curve, point addition and point doubling can be executed upon the curve without resorting to division in the underlying Galois field. This is important because division is the most complex operation. Using homogeneous coordinates and performing some grouping of basic operations it is thus possible to use the inner product operation to perform the kP operation on the whole elliptic curve.
The invention will now be described, by way of example only, in connection with the enclosed drawings, wherein:
Essentially, the present invention aims at providing a hardware device (such as a functional unit or a coprocessor) adapted to be integrated in an embedded systems (for instance a smartcard) in order to render public key cryptographic operations faster.
Specifically, the operation to be implemented is:
E(x)=((A(x)×B(x))+C(x)×D(x))mod φ(x)
where A(x), B(x), C(x), D(x), and E(x) are elements of finite field GF(2^{n}), or polynomial or order n−1 having one bit coefficients. Any of this can also be identified as a sequence of n bits. Usually 150≦n≦250 for cryptographic applications using elliptic codes (ECC).
The representation of the finite field GF(2^{n}) is given in a “polynomial basis” or a “standard basis”. Choosing the representation of the field corresponds to fixing the polynomial of order n≧1 which generates the field itself. Such generator polynomial is designated φ(x). The generator polynomial φ(x) is fixed, and is changed only if the representation of the field is changed, which happens only if the system is reconfigured, which is seldom the case.
The result E(x) is the inner product (scalar product) of two vectors each having a first and a second element belonging to the finite field GF(2^{n}). The inner (scalar) product operation can be easily generalized to vectors having three or more elements belonging to the finite field GF(2^{n}).
One could well write:
E(x)=([A(x),B(x)]{circle around (×)}[C(x), D(x)])mod φ(x)
wherein {circle around (×)} represents the inner (scalar) product of two vectors. This formal representation is thoroughly equivalent to the previous one.
The operators + and × denote, respectively: addition in GF(2^{n}), which is carried out by means of simple array of n XOR gates having two input, and multiplication in GF(2^{n}), which corresponds to computing the product of two polynomials, in current algebraic sense, followed by “reduction” with respect to the generator polynomial φ(x), that is computing the remainder of the division with respect to φ(x). Such reduction operation is indicated with the symbol “mod φ(x)”: for instance “F(x)=A(x)×B(x)mod φ(x)” designate calculation of the current product of two polynomials A(x) and B(x), each of order n−1, with a result of order 2n−2, followed by computing the remain of the division by the polynomial φ(x), of order n. The final result is a polynomial of order n−1.
Essentially, the various exemplary embodiments of the invention shown in
In general, factors or operands A(x), B(x), C(x) and D(x) could be provided in serial or parallel format. In practice, each factor is a sequence of n bits. Serial operation involves providing one bit at a time, while parallel operation requires all the n bits to be provided simultaneously.
Purely serial architectures have a throughput too low for cryptographic applications. Fully parallel architectures give rise to circuits which are too complex for embedded systems (one as to keep in mind that, typically n=200 bits).
It is therefore advisable to resort to digitserial architectures, wherein some factors are provided in parallel, while other are provided serially in groups of k≧1 bit at a time (for K=1 one has the serialparallel case, for k=n one has the fully parallel case). In that case a balance is struck between circuit complexity and throughput.
In the preferred embodiment of the present invention, operands B(x) and D(x) are provided in a parallel format, while operands A(x) and C(x) are provided in groups of k bits at a time. The result E(x) is finally produced in a parallel format.
The digitserial approach is particularly suited for using combination with lookup tables. In such tables are initialized by memorising some partial computation results. Subsequently, the contents of these tables is read and reused with purpose of making the whole computation faster. In certain cases the table contents may be fixed, or may vary infrequently. In these cases, table initialization can be dispensed with or carried out only from time to time, thus having low impact on circuit operation.
The digitserial approach based on the use of lookup tables has been originally proposed in the first work by M. Hasan cited in the foregoing. In the captioned work only multiplication, and not calculation of inner (scalar) product of vectors of two or more elements is considered.
In the following, three different architectural embodiments are considered.
In a first embodiment shown in
Conversely, reference numerals 14 and 16 designate two lookup tables TAB B(x) and TAB D(x) storing a first set of digital words derived—as better explained in the following—from the factors B(x) and D(x), each representative of the second element of one of the two vectors to be multiplied, and the generator polynomial φ(x).
Reference numeral 26 designates a further lookup table TAB φ(x) storing a second set of digital words derived—as better explained in the following—from the generator polynomial and representative of the mod φ(x) reduction function.
Lookup tables 14, 16 and 26 are preferably comprised of solid state memories such as RAMs, ROMs or EPROMs, each including words n bits each.
Reference numerals 18, 20 and 22 designate three arrays of n XOR gates with two inputs for each gate. Reference numerals 28 and 30 designate further n bit registers.
Each register is adapted to perform a k bit shift at a time. The shift unit of the result register E(x) is shown explicitly.
Finally, reference numeral 24 designates a feedback line from register 30 to one of the inputs of array 18.
Lookup table TAB φ(x) implemented by memory 26 is fixed once the generator polynomial of the field is chosen, which is fixed. Therefore, memory 26 is preferably a ROM or EPROM with 2^{k }words of n bit each.
The ith word with 0≦i≦2^{k}−1 of table φ(x) is obtained by considering only the n least significant bits of the polynomial including n+k coefficient obtained by the previous calculation: φ(x)×P(i), this being a product without reduction. By P(i) the polynomial of order between 0 and k−1 (extremities included) is meant as having exactly k coefficients, wherein the serious of coefficients represents the natural binary expansion of integer i.
In view of operation, tables TAB B(x) and TAB D(x) in memories 14 and 16 are first initialized. Each lookup table is a RAM with 2^{k }words of n bits each.
The ith word, with 0≦1≦2^{k}−1 of TAB B(x) is obtained in the way: B(x)×P(i)mod φ(x).
The ith word, with 0≦1≦2^{k}−1, of TAB D(x) is obtained in the following way; D(x)×P(i)mod φ(x).
Shift registers 10 and 12 are loaded in parallel with operands A(x) and C(x). Register E(x) is initialized to 0.
Operands A(x) and C(x) are shifted by k positions. The k most significant bits of operands A(x) and C(x) are extracted and sent as addresses to tables 14 and 16, respectively. The two nbit words stored in these tables at those addresses are read out.
The contents of register 28 are shifted by k positions. The k most significant bits are extracted whereas k “0” bits are inserted in the k least significant positions of the register. The k most significant bits of partial result E(x) are sent as an address to table 26. The corresponding n bit word stored therein is read out.
The three n bit words read out from tables 14, 16 and 20, respectively are added to the current contents of parallel register 28.
If operands or factors A(x) and C(x) have not been completely scanned a further shift operation is carried out as described in the foregoing. When such scanning is completed, register 28 contains the final result.
Consequently, the partial products A×B and C×D are not computed separately by the inner product functions unit of the invention. Instead, the inner product unit computes a mix of partial results and then accumulates them to form the final result. It is not possible to point out any internal component of the inner product unit where the two mentioned multiplications are carried out separately.
Instead of executing one partial addition with the factor B in the main loop of the multiplication, two partial additions—with the factors B(x) and D(x)—are executed in parallel and the partial result thus obtained is reduced.
Consequently, the architecture shown in
The arrangement shown in
In the arrangement of
In the block diagram of
Essentially, in the arrangement of
Table 32, designated TAB BD(x) contains all the sums twobytwo, in all possible ways, of the digital words of the lookup tables 14 and 16 of
Operation of the embodiment shown in
Also in this case, factors A(x) and C(x) are shifted by k positions. The k most significant bits of A(x) and C(x) are extracted and concatenated to obtain a 2^{k }bit word. This word is sent as an address to table 32 and the corresponding n bit word stored therein is read out.
In this case the two n bit words read out from table 32 and table 26 (this latter word being identified as previously described in connection with the embodiment of
It will be appreciated that by resorting to the arrangement of
The arrangement of
Again, the same reference numerals already used in
The embodiment of
In operation, table TAB BDφ(x) in memory 36 is first initialized and the hth word, with 0≦h≦2^{3k}−1 of table 36 is obtained as a consolidated combined digital word in the following way: (word of index i of TAB BD(x))+(word of index j of TAB φ(x)), where integers i, j with 0≦i≦2^{2k}−1 e 0≦j≦2^{k}−1 are related to h in the following way: h=i+j×2^{2k}.
As in the previous embodiments, factors A(x) and C(x) loaded in registers 10 and 12 are shifted by k positions. As in the embodiment of
The contents of register 28 is shifted by k positions. The k most significant bits are extracted while introducing k “0” bits in the k least significant positions of the register. The k most significant bits of the 2k bit words obtained by extracting and concatenating factors A(x) and C(x) are concatenated thus obtaining 3k bit words. This 3k bit words is sent as an address to table 36 and the corresponding n bit digital word stored therein is read to be added to the current contents of register 28.
Again, once factors A(x) and C(x) have been finally scanned, register 28 contains the final result.
In the arrangement of
All the arrangements shown in
In generals terms, table 26 (
Tables 11 and 16 of
However, certain cases may occur where factors B(x) and D(x) are fixed, or change only quite rarely (this may be the case if the representation of the finite field is changed or if the system is subject to reconfiguration). Under these circumstances, all tables 14, 16, 32 and 36 can be implemented in the form of ROMs or EPROMs, which generally have a lower cost than RAMs.
It will be appreciated that factors B(x) and D(x) play a role in computing the inner (scalar) product when they are used to initialize the various tables; after this they no longer play any role in computation. Conversely, factors A(x) and C(x) play no role in table initialization, but are stored in the respective registers 10, 12 to be used during calculation.
In the embodiment of
The solution disclosed can be easily extended to calculating inner products of vectors including more than two elements. Also, the arrangement of the invention is also adapted for use as multiplier of scalar entities, while permitting use also as a current finite field multiplier.
In the embodiment shown in
The present invention has been described with reference to the preferred embodiments. However, the present invention is not limited to those embodiments. Various changes and modifications may be made within the spirit and scope of the appended claims.