APPARATUS AND METHOD FOR COMPRESSION CODING FOR ARTIFICIAL NEURAL NETWORK

0Associated
Cases 
0Associated
Defendants 
0Accused
Products 
0Forward
Citations 
0
Petitions 
1
Assignment
First Claim
127. 27. (canceled)
1 Assignment
0 Petitions
Accused Products
Abstract
A compression coding apparatus for artificial neural network, including memory interface unit, instruction cache, controller unit and computing unit, wherein the computing unit is configured to perform corresponding operation to data from the memory interface unit according to instructions of controller unit; the computing unit mainly performs three steps operation: step one is to multiply input neuron by weight data; step two is to perform adder tree computing and add the weighted output neuron obtained in step one levelbylevel via adder tree, or add bias to output neuron to get biased output neuron; step three is to perform activation function operation to get final output neuron. The present disclosure also provides a method for compression coding of multilayer neural network.
0 Citations
No References
No References
53 Claims
 127. 27. (canceled)
 28. A neural network processor, comprising:
a floatingpoint number converter configured to; receive one or more first weight values of a first bit length and first input neuron data, and convert the one or more first weight values to one or more second weight values of a second bit length, wherein the second bit length is less than the first bit length; and a computing unit configured to; receive the first input neuron data, and calculate first output neuron data based on the first input neuron and the second weight values.  View Dependent Claims (29, 30, 31, 32, 33, 34, 35, 36, 37, 38)
 39. A method for processing neural network data, comprising:
receiving, by a floatingpoint number converter, one or more first weight values of a first bit length and first input neuron data; converting, by the floatingpoint number converter, the one or more first weight values to one or more second weight values of a second bit length, wherein the second bit length is less than the first bit length; receiving, by a computing unit, the first input neuron data; and calculating, by the computing unit, first output neuron data based on the first input neuron and the second weight values.  View Dependent Claims (40, 41, 42, 43, 44, 45, 46, 47, 48, 49)
 50. An apparatus for neural network processing, comprising
an input/output (I/O) interface configured to exchange data with peripheral devices; a central processing unit (CPU) configured to process the data received via the I/O interface; a neural network processor configured to process at least a portion of the received data, wherein the neural network processor includes; a floatingpoint number converter configured to; receive one or more first weight values of a first bit length and first input neuron data, and convert the one or more first weight values to one or more second weight values of a second bit length, wherein the second bit length is less than the first bit length, and a computing unit configured to; receive the first input neuron data, calculate the first output neuron data based on the first input neuron data and the second weight values, and calculate one or more weight gradients based on the second weight values; and a memory configured to store the first weight values and input data that includes the first neuron data.  View Dependent Claims (51, 52, 53)
1 Specification
The present disclosure relates to the field of artificial neural network processing technology and specifically to an apparatus and method for compression coding for artificial neural network. Particularly, it relates to execution units for performing the method of artificial neural network algorithm or devices comprising these execution units, and execution units for apparatus and method for multilayer artificial neural network operation, reverse propagation training algorithm and its compression coding as well as devices comprising these execution units.
A multilayer artificial neural network is widely used in fields like pattern recognition, image processing, function approximation and optimized computation etc. Particularly as studies on reverse propagation training algorithm and pretraining algorithm go deeper and deeper in recent years, the multilayer artificial neural network attracts more and more attention from the academic and industrial fields due to its higher recognition accuracy and better parallelizability.
With the surge in computing and accessing amount in artificial neural network, prior arts generally utilize general processor to process multilayer artificial neural network operation, training algorithm and its compression coding and the above algorithms are supported by utilizing general register file and general functional component to execute general instructions. One of the disadvantages for using general processor is that the low computing performance of a signal general processor cannot meet the needs for the performance of multilayer artificial neural network operation. Meanwhile, if multiple general processors are working concurrently, the intercommunication between general processors will limit its performance. In addition, general processor needs to transcode a multilayer artificial neural network operation into a long sequence of operation and access instruction, and this frontend transcoding of processor causes relatively high power consumption. Another known method for supporting multilayer artificial neural network operation, training algorithm and its compression coding is to use graphics processing unit (GPU). This method supports the above algorithms by using general register file and general stream process unit to execute general SIMD instructions. Since GPU is specifically used for executing graphics and image computing and scientific calculation, it does not provide specific support to multilayer artificial neural network operation, and thus a lot of frontend transcoding is still required to perform multilayer artificial neural network operation, and as a result large additional costs are incurred. Besides, GPU only has relatively small onchip cache, model data (weight) of multilayer artificial neural network needs to be carried repeatedly from outside of the chip, and offchip bandwidth becomes the main bottleneck for its performance and causes huge power consumption as well.
The following presents a simplified summary of one or more aspects in order to provide a basic understanding of such aspects. This summary is not an extensive overview of all contemplated aspects and is intended to neither identify key or critical elements of all aspects nor delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more aspects in a simplified form as a prelude to the more detailed description that is presented later.
The present disclosure presents examples of techniques for compression coding. An example apparatus may include a floatingpoint number converter configured to receive one or more first weight values of a first bit length and first input neuron data, and convert the one or more first weight values to one or more second weight values of a second bit length, wherein the second bit length is less than the first bit length; and a computing unit configured to receive the first input neuron data from the neuron data cache, calculate first output neuron data based on the first input neuron data and the second weight values, and calculate one or more weight gradients to update the one or more first weight values.
An example method may include receiving one or more first weight values of a first bit length; receiving first input neuron data; converting the one or more first weight values to one or more second weight values of a second bit length, wherein the second bit length is less than the first bit length; calculating first output neuron data based on the first input neuron data and the second weight values; and calculating one or more weight gradients to update the one or more first weight values.
Another example apparatus may include a floatingpoint number converter configured to receive one or more first weight values of a first bit length and first input neuron data, and convert the one or more first weight values to one or more second weight values of a second bit length, wherein the second bit length is less than the first bit length; and a computing unit configured to receive the first input neuron data from the neuron data cache, calculate first output neuron data based on the first input neuron data and the second weight values, and calculate one or more weight gradients based on the second weight values.
An example system in which compression coding may be implemented may include an input/output (I/O) interface configured to exchange data with peripheral devices; a central processing unit (CPU) configured to process the data received via the I/O interface; a neural network processor configured to process at least a portion of the received data, wherein the neural network processor includes: a floatingpoint number converter configured to receive one or more first weight values of a first bit length and first input neuron data, and convert the one or more first weight values to one or more second weight values of a second bit length, wherein the second bit length is less than the first bit length, and a computing unit configured to receive the first input neuron data, calculate the first output neuron data based on the first input neuron data and the second weight values, and calculate one or more weight gradient to update the one or more first weight values; and a memory configured to store the first weight values and input data that includes the first neuron data.
Another example system in which compression coding may be implemented may include an input/output (I/O) interface configured to exchange data with peripheral devices; a central processing unit (CPU) configured to process the data received via the I/O interface; a neural network processor configured to process at least a portion of the received data, wherein the neural network processor includes: a floatingpoint number converter configured to: receive one or more first weight values of a first bit length and first input neuron data, and convert the one or more first weight values to one or more second weight values of a second bit length, wherein the second bit length is less than the first bit length, and a computing unit configured to: receive the first input neuron data, calculate the first output neuron data based on the first input neuron data and the second weight values, and calculate one or more weight gradients based on the second weight values; and a memory configured to store the first weight values and input data that includes the first neuron data.
To the accomplishment of the foregoing and related ends, the one or more aspects comprise the features herein after fully described and particularly pointed out in the claims. The following description and the annexed drawings set forth in detail certain illustrative features of the one or more aspects. These features are indicative, however, of but a few of the various ways in which the principles of various aspects may be employed, and this description is intended to include all such aspects and their equivalents.
Various aspects are now described with reference to the drawings. In the following description, for purpose of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more aspects. It may be evident, however, that such aspect(s) may be practiced without these specific details.
Neural networks are a family of models for a broad range of emerging machine learning and pattern recondition applications. Neural networks techniques are conventionally executed on generalpurpose processors such as Central Processing Unit (CPU) and Generalpurpose Graphics Processing Units (GPGPU). However, generalpurpose processors may be limited to computing floatingpoint numbers of a single format. The capability of processing one single format of floatingpoint numbers may lead to unnecessary accuracy while increasing power consumption and memory usage.
The neural network processor 106 may further include an instruction cache 206 and a controller unit 208. The instruction cache 206 may refer one or more storage devices configured to store instructions received from the CPU 104. The controller unit 208 may be configured to read the instructions from the instruction cache 206 and decode the instructions.
Upon receiving the decoded instructions from the controller unit 208, the input data from the neuron data cache 212, and the weight values from the weight cache 214, the computing unit 210 may be configured to calculate one or more groups of output data based on the weight values and the input data in a forward propagation process. In some other examples, the computing unit 210 may be configured to calculate weight gradients to weight values and/or bias gradients to update one or more bias values in a backward propagation process.
The computing unit 210 may further include one or more multipliers configured to multiply the modified input data by the modified weight values to generate one or more weighted input data, one or more adders configured to add the one or more weighted input data to generate a total weighted value and add a bias value to the total weighted value to generate a biased value, and an activation processor configured to perform an activation function on the biased value to generate a group of output data.
The generated output data may be temporarily stored in an output data cache 216 and may be further transmitted to the memory 108 via a direct memory access (DMA) module 204.
In some examples, the neural network processor 106 may further include a floatingpoint number converter 218 for converting weight values of different big lengths respectively for the forward propagation process and the backward propagation process. For example, the floatingpoint number converter 218 may be configured to convert first weight values that are stored in weight cache 214 into second weight values. The bit length of the second weight values are less than the bit length of the first weight values.
As depicted, the floatingpoint number converter 218 may be configured to receive one or more first weight values of a first bit length from an input device 116 or from the weight cache 214. The first weight values may be formatted as floatingpoint numbers.
The one or more first weight values may be converted by the floatingpoint number converter 218 to one or more second weight values of a second bit length for further computing. In some examples, the second weight values may be also formatted as floatingpoint numbers.
In some examples, the first weight values may be received from the input device 116 via a bus 318. Data may be transmitted and/or received via the bus 318 to and from other components that are not shown in
In other words, each of the first weight values, when stored as a series of bits, may include a first sign bit, a first exponent field, and a first mantissa field. The first sign bit may refer to a bit that indicates the sign of the corresponding first weight value and may be assigned with a value of 0 or 1. The first exponent field may refer to a number of bits that store a value of the exponent of the corresponding first weight value. The bit length of the first exponent field may be referred to as K1 hereinafter. The first mantissa field may refer to a number of bits that store a value of the mantissa of the corresponding first weight value. The bit length of the first mantissa field may be referred to as K2 hereinafter. In an example of IEEE 754 standard (single type), K1 and K2 may be respectively set to 8 and 23.
In some aspects, an exponent bit length and a base value may be received from the input device 116 via the bus 318 along with the first weight values. The exponent bit length (may be referred to as “N”) may be calculated based on the first exponent fields of the first weight values. For example, the exponent bit length N may be calculated based on a maximum value and a minimum value of the first exponent fields, e.g., in accordance with a formula:
N=log_{2}(E_{max}−E_{min})/^{2}, in which E_{max }refers to the maximum value of the first exponent fields, and E_{min }refers to the minimum value of the first exponent fields.
The base value (may be referred to as “A”) may be similarly calculated based on the maximum value and the minimum value of the first exponent fields, e.g., in accordance with another formula: A=(E_{max}−E_{min})/2, in which E_{max }refers to the maximum value of the first exponent fields and E_{min }refers to the minimum value of the first exponent fields.
In these aspects, the exponent bit length N and the base value A may be calculated outside the floatingpoint number converter 218 prior to being transmitted from the input device 116. In some other aspects, the exponent bit length N and the base value A may be temporarily stored in the weight cache 214 after being received via the bus 118 and may be further retrieved by the floatingpoint number converter 218 at a later time for converting. In some other aspects, subsequent to receiving the first weight values, the floatingpoint number converter 218 may be configured to calculate the exponent bit length N and the base value A similarly based on the maximum value and the minimum value of the first exponent fields.
According to the present aspects, the floatingpoint number converter 218 may further include a coder 308, a decoder 310, a configuration register (CFG) 312, and a base address register 314. The CFG 312 and the based address register 114 may respectively refer to a portion of an onboard memory integrated in the floatingpoint number converter 218 and, thus, may provide direct access for the coder 308 and the decoder 310. The CFG 312 may be configured to store the exponent bit length N and the base address register 114 may be configured to store the base value A.
After the floatingpoint number converter 218 receives the first weight values, the coder 308 may be configured to calculate one or more second weight values. Similar to the first weight values, each of the second weight values may be represented as (−1)^{S2}×(1+M2)×2^{E2}, in which S2 denotes the sign of the corresponding second floatingpoint number, M2 denotes the mantissa of the corresponding second floatingpoint number, and E2 denotes the exponent of the corresponding second floatingpoint number. Each of the second weight values may respectively correspond to one of the first weight values. That is, each second floatingpoint number may be calculated based on a corresponding first weight values and other parameters of the group of first weight values. However, a bit length of the second weight values (e.g., a total number of bits of each second floatingpoint number) or a bit length of a mantissa of the second weight values may be different from that of the first weight values. In some aspects, the bit length of the second weight values may be preset to a fixed value, e.g., 16, 32, 64, 128, etc., which may be less than the bit length of the first weight values.
That is, each of the second weight values, as a series of bits, may include a second sign bit, a second exponent field, and a second mantissa field. Similar to the first sign bit, the second sign bit may refer to a bit that indicates the sign of the corresponding second weight values and may be assigned with a value of 0 or 1. The second exponent field may refer to one or more bits that store a value of the exponent of the corresponding second weight value. The second mantissa field may refer to one or more bits that store a value of the mantissa of the corresponding second weight value. In some other examples, the bit length of the second mantissa field of the second weight values may be less than the first mantissa field of the first weight values.
To calculate the second weight values, the coder 308 may be configured to determine the bit lengths of the second exponent field and the second mantissa field. For example, the coder 308 may be configured to determine the bit length of the second exponent field to be the same as the calculated exponent bit length N. The bit length of the second mantissa field may be determined by the coder 308 in accordance with a formula: L2=C−N−1, in which L2 denotes the bit length of the second mantissa field of the corresponding second floatingpoint number, N denotes the exponent bit length, and C denotes the preset bit length of the second floatingpoint numbers.
Further, the coder 308 may be configured to determine the respective values of the second sign bit, the second exponent field, and the second mantissa field. In some aspects, the coder 308 may be configured to assign the second sign bit a same value as the first sign bit. The value of the second exponent field may be calculated by the coder 308 based on a corresponding first exponent field, an exponent bias of the first weight values, and the base value A stored in the base address register 314. The exponent bias of the first weight values are determined by the format standard of the first floatingpoint numbers. For example, if the first weight values are in compliance with IEEE 754 standard (single type), the exponent bias of the weight values may be set to 127 according to the IEEE 754 standard. The value of the second exponent field may be determined in accordance with the following example formula: E2=E1−B+A, in which E2 denotes the value of the second exponent field, E1 denotes the value of the first exponent field, B denotes the exponent bias of the first floatingpoint numbers, and A denotes the base value.
Further to the aspects, the coder 308 may be configured to determine the value of the second mantissa field. As the bit length of the second mantissa field may have been determined, the coder 308 may be configured to select one or more most significant bits (MSB) of the corresponding first mantissa field to be the value of the second mantissa field. The number of MSBs may be the determined bit length of the second mantissa field, e.g., C−N−1, in which N denotes the exponent bit length and C denotes the preset bit length of the second weight values. In an example that the first weight values comply with IEEE 754 (single type) and the second weight values comply with IEEE 7542008 (Half), the bit length of the first mantissa field may be set to 23 and the bit length of the second mantissa field may be set to 10. In this example, the coder 308 may be configured to select the 10 MSBs of the first mantissa field and assign the 10 MSBs to be the value of the second mantissa field.
When the respective values and bit lengths of different fields of the second weight values are determined, the second weight values are calculated; that is, the first weight values are converted to the second weight values.
In some aspects, the calculated second floatingpoint numbers may be further transmitted by the floatingpoint number converter 218 to the computing unit 210 for forward propagation process or backward propagation process. The processes may include operations such as multiplication, addition, etc. as further described in detail below.
The forward propagation computation of multilayer artificial neural networks according to embodiments of the present disclosure comprises operations in two or more layers. For each layer, a dot product operation may be performed to an input vector and a weight vector and from the result is obtained an output neuron through an activation function. The activation function may be sigmoid function, tanh function, relu function, softmax function, etc.
As depicted, the example computing process may be performed from the i^{th }layer to the (i+1)^{th }layer. The term “layer” here may refer to a group of operations, rather than a logic or a physical layer. A triangularshaped operator (A as shown in
The forward propagation process may start from input neuron data received at the i^{th }layer (e.g., input neuron data 452A). Hereinafter, input neuron data may refer to the input data at each layer of operations, rather than the input data of the entire neural network. Similarly, output neuron data may refer to the output data at each layer of operations, rather than the output data of the entire neural network.
The received input neuron data 452A may be multiplied or convolved by one or more weight values 452C (e.g., the second weight values generated by the floatingpoint number converter 218). The results of the multiplication or convolution may be transmitted as output neuron data 454A. The output neuron data 454A may be transmitted to the next layer (e.g., the (i+1)^{th }layer) as input neuron data 456A. The forward propagation process may be shown as the solid lines in
The backward propagation process may start from the last layer of the forward propagation process. For example, the backward propagation process may include the process from the (i+1)^{th }layer to the i^{th }layer. During the process, the input data gradients 456B may be transmitted to the i^{th }layer as output gradients 454B. The output gradients 454B may then be multiplied or convolved by the input neuron data 452A to generate weight gradients 452D.
In some examples, the computing unit 210 may be further configured to update the first weight values stored in weight cache 214 based on the generated weight gradients 452D. The updated first weight values may be converted to updated second weight values by the floatingpoint number converter 218. The updated second weight values may be transmitted to the computing unit 210 for future processing at the i^{th }layer.
In some other examples, the computing unit 210 may continue to use weight values 452C for the processing at the i^{th }layer without updating the first weight values in the weight cache 214.
Additionally, the output gradients 454B may be multiplied by the weight values 452C to generate input data gradients 452B. The backward propagation process may be shown as the dotted lines in
At block 502, method 500 may include receiving, by the weight cache 214, one or more first weight values of a first bit length. The weight cache 214 may be configured to transmit the one or more first weight values to the floatingpoint number converter 218.
At block 504, method 500 may include receiving, by the neuron data cache 212, first input neuron data. In some example, the input neuron data may include one or more vectors as the input at a layer of the neural network. The first input neuron data may be of the first bit length. The neuron data cache 212 may be configured to transmit the first input neuron data to the computing unit 210.
At block 506, method 500 may include converting the one or more first weight values to one or more second weight values of a second bit length. The second bit length may be less than the first bit length. As described in detail in accordance with
At block 508, method 500 may include calculating first output neuron data based on the first input neuron data and the second weight values. That is, the computing unit 210 may be configured to multiply or convolve the second weight values (e.g., weight values 452C) with the input neuron data 452A to generate output neuron data 454A.
At block 510, method 500 may include calculating one or more weight gradients to update the one or more first weight values. In some examples, the computing unit 210 may be further configured to calculate the weight gradients. For example, the backward propagation process may include the process from the (i+1)^{th }layer to the i^{th }layer. During the process, the input data gradients 456B may be transmitted to the i^{th }layer as output gradients 454B. The output gradients 454B may then be multiplied or convolved by the input neuron data 452A to generate weight gradients 452D by the computing unit 210.
Additionally, during the forward propagation process, the computing unit 210 or the components included therein (e.g., the multipliers, the adders, the activation processor, etc.) may be further configured to perform one or more operations including multiplying a first portion of the first neuron data by a second portion of the first neuron data, adding multiplication results output from the one or more multipliers, applying an activation function to addition results output from the one or more adders; and performing a pooling operation to activation results output from the activation units. The pooling operation may refer to any of the pooling operations (e.g., maxpooling (MAXPOOLING) or average pooling (AVGPOOLING)).
Further, the computing unit 210 may be configured to update the first weight values based on the calculated weight gradients (e.g., weight gradients 452D).
Further still, the floatingpoint number converter 218 may be configured to convert the first input neuron data (e.g., input neuron data 452A) to second input neuron data of the second bit length. That is, the input neuron data may be transmitted from the computing unit 210 to the floatingpoint number converter 218 and be converted to one or more floatingpoint numbers of a bit length equal to the bit length of the second weight values. The conversion of the input neuron data may be implemented in accordance with the process described in
Similarly, the floatingpoint number converter 218 may be configured to convert the first output neuron data to second output data of the second bit length. That is, the output neuron data (e.g., output neuron data 454A) may be transmitted from the computing unit 210 to the floatingpoint number converter 218 and be converted to one or more floatingpoint numbers of a bit length equal to the bit length of the second weight values. The conversion of the output neuron data may be implemented in accordance with the process described in
It is understood that the specific order or hierarchy of steps in the processes disclosed is an illustration of exemplary approaches. Based upon design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged. Further, some steps may be combined or omitted. The accompanying method claims present elements of the various steps in a sample order, and are not meant to be limited to the specific order or hierarchy presented.
The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects. Thus, the claims are not intended to be limited to the aspects shown herein but is to be accorded the full scope consistent with the language claims, wherein reference to an element in the singular is not intended to mean “one and only one” unless specifically so stated, but rather “one or more.” Unless specifically stated otherwise, the term “some” refers to one or more. All structural and functional equivalents to the elements of the various aspects described herein that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the claims. Moreover, nothing disclosed herein is intended to be dedicated to the public regardless of whether such disclosure is explicitly recited in the claims. No claim element is to be construed as a means plus function unless the element is expressly recited using the phrase “means for.”
Moreover, the term “or” is intended to mean an inclusive “or” rather than an exclusive “or.” That is, unless specified otherwise, or clear from the context, the phrase “X employs A or B” is intended to mean any of the natural inclusive permutations. That is, the phrase “X employs A or B” is satisfied by any of the following instances: X employs A; X employs B; or X employs both A and B. In addition, the articles “a” and “an” as used in this application and the appended claims should generally be construed to mean “one or more” unless specified otherwise or clear from the context to be directed to a singular form.