PROCESSING METHOD AND ACCELERATING DEVICE

US 20200134460A1
Filed: 11/28/2019
Published: 04/30/2020
Est. Priority Date: 05/23/2017
Status: Active Application

First Claim

Patent Images

1. A data compression method, comprising:

performing coarse-grained pruning on weights of a neural network, which includes;

selecting M weights from the neural network through a sliding window, and setting all or part of the M weights to 0 when the M weights meet a preset condition, where the M is an positive integer greater than 0;

performing a first retraining on the neural network, where the weight which has been set to 0 in the retraining process remains 0; and

quantizing the weights of the neural network, which includes;

grouping the weights of the neural network;

performing a clustering operation on each group of weights by using a clustering algorithm, computing a center weight of each class, and replacing all the weights in each class by the center weights.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present disclosure provides a processing device including: a coarse-grained pruning unit configured to perform coarse-grained pruning on a weight of a neural network to obtain a pruned weight, an operation unit configured to train the neural network according to the pruned weight. The coarse-grained pruning unit is specifically configured to select M weights from the weights of the neural network through a sliding window, and when the M weights meet a preset condition, all or part of the M weights may be set to 0. The processing device can reduce the memory access while reducing the amount of computation, thereby obtaining an acceleration ratio and reducing energy consumption.

1 Citation

20 Claims

1. A data compression method, comprising:
- performing coarse-grained pruning on weights of a neural network, which includes;
  
  selecting M weights from the neural network through a sliding window, and setting all or part of the M weights to 0 when the M weights meet a preset condition, where the M is an positive integer greater than 0;
  
  performing a first retraining on the neural network, where the weight which has been set to 0 in the retraining process remains 0; and
  
  quantizing the weights of the neural network, which includes;
  
  grouping the weights of the neural network;
  
  performing a clustering operation on each group of weights by using a clustering algorithm, computing a center weight of each class, and replacing all the weights in each class by the center weights.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14)
- - 2. The data compression method of claim 1, wherein, after quantizing the weight of the neural network, the method further includes:
    - encoding the center weight to obtain a weight codebook and a weight dictionary.
  - 3. The data compression method of claim 2, wherein, after encoding the center weight, the method further includes:
    - performing a second retraining on the neural network, and, wherein, only the weight codebook is trained during the second retraining of the neural network, and the weight dictionary remains unchanged.
  - 4. The data compression method of claim 1, wherein the preset condition is:
    - an information quantity of the M weights is less than a first given threshold, where the information quantity of the M weights is an arithmetic mean of an absolute value of the M weights, a geometric mean of the absolute value of the M weights, or a maximum value of the M weights;
      
      the first given threshold is a first threshold, a second threshold, or a third threshold; and
      
      the information quantity of the M weights being less than the first given threshold includes;
      
      the arithmetic mean of the absolute value of the M weights being less than the first threshold, or the geometric mean of the absolute value of the M weights being less than the second threshold, or the maximum value of the M weights being less than the third threshold.
  - 5. The data compression method of claim 1, further comprising:
    - repeating selecting M weights from the neural network through the sliding window, setting all or part of the M weights to 0 when the M weights meet the preset condition; and
      
      performing the first retraining on the neural network until no weight can be set to 0 without losing a preset precision.
  - 6. The data compression method of claim 5, wherein performing coarse-grained pruning on the weight of the fully connected layer of the neural network includes:
    - setting the weight of the fully connected layer being a two-dimensional matrix (Nin, Nout), where Nin represents a count of input neurons and Nout represents a count of output neurons, and the fully connected layer has Nin*Nout weights;
      
      setting a size of the sliding window being Bin*Bout, where Bin is a positive integer greater than 0 and less than or equal to Nin, and Bout is a positive integer greater than 0 and less than or equal to Nout;
      
      making the sliding window slide Sin stride in a direction of Bin, or slide Sout stride in a direction of Bout, where Sin is a positive integer greater than 0 and less than or equal to Bin, and Sout is a positive integer greater than 0 and less than or equal to Bout; and
      
      selecting M weights from Nin*Nout weights through the sliding window, and when the M weights meet the preset condition, all or part of the M weights are set to 0, where M=Bin*Bout.
  - 7. The data compression method of claim 5, wherein performing coarse-grained pruning on the weight of the convolutional layer of the neural network includes:
    - setting the weight of the convolutional layer of the neural network being a four-dimensional matrix (Nfin,Nfout,Kx,Ky), where Nfin represents a count of input feature maps, Nfout represents a count of output feature maps, (Kx,Ky) is a size of a convolution kernel, and the convolutional layer has Nfin*Nfout*Kx*Ky weights;
      
      setting the sliding window being a four-dimensional sliding window with a size of Bfin*Bfout*Bx*By, where Bfin is a positive integer greater than 0 and less than or equal to Nfin, Bfout is a positive integer greater than 0 and less than or equal to Nfout, Bx is a positive integer greater than 0 and less than or equal to Kx, and By is a positive integer greater than 0 and less than or equal to Ky;
      
      making the sliding window slide Sfin stride in a direction of Bfin, or slide Sfout stride in a direction of Bfout, or slide S stride in a direction of Bx, or slide Sy stride in a direction of By, where Sfin is a positive integer greater than 0 and less than or equal to Bfin, Sfout is a positive integer greater than 0 and less than or equal to Bfout, Sx is a positive integer greater than 0 and less than or equal to Bx, and Sy is a positive integer greater than 0 and less than or equal to By; and
      
      selecting M weights from Nfin*Nfout*Kx*Ky weights through the sliding window, and when the M weights meet the preset condition, all or part of the M weights are set to 0, where M=Bfin*Bfout*Bx*By.
  - 8. The data compression method of claim 5, wherein performing coarse-grained pruning on the weight of the LSTM layer of the neural network includes:
    - setting the weight of the LSTM layer being composed of m weights of the fully connected layer, where m is a positive integer greater than 0, and an i^thweight of the fully connected layer is a two-dimensional matrix (Nin_i, Nout_i), where i is a positive integer greater than 0 and less than or equal to m, Nin_i represents a count of input neurons of the i^thweight of the fully connected layer, and Nout_i represents a count of output neurons of the i^thweight of the fully connected layer;
      
      setting a size of the sliding window being Bin_i*Bout_i, where Bin_i is a positive integer greater than 0 and less than or equal to Nin_i, and Bout_i is a positive integer greater than 0 and less than or equal to Nout_i;
      
      making the sliding window slide Sin_i stride in a direction of Bin_i, or slide Sout_i stride in a direction of Bout_i, where Sin_i is a positive integer greater than 0 and less than or equal to Bin_i, and Sout_i is a positive integer greater than 0 and less than or equal to Bout_i; and
      
      selecting M weights from Bin_i*Bout_i weights through the sliding window, and when the M weights meet the preset condition, all or part of the M weights are set to 0, where M=Bin_i*Bout_i.
  - 9. The data compression method of claim 1, wherein the first retraining adopts a back propagation algorithm, and the weight which has been set to 0 in the retraining process remains 0.
  - 10. The data compression method of claim 1,wherein the grouping method of the weights of the neural network includes grouping into a group, layer-type-based grouping, inter-layer-based grouping, and/or intra-layer-based grouping,wherein grouping into a group includes grouping all the weights of the neural network into a group,grouping the weights of the neural network according to the layer-type-based grouping method includes:
    - grouping the weights of all convolutional layers, the weights of all fully connected layers, and the weights of all LSTM layers in the neural network into one group respectively,wherein grouping the weights of the neural network by the inter-layer-based grouping method includes;
      
      grouping the weights of one or a plurality of convolutional layers, one or a plurality of fully connected layers and one or a plurality of LSTM layers in the neural network into one group respectively,wherein grouping the weights of the neural network by the intra-layer-based grouping method includes;
      
      segmenting the weights in one layer of the neural network, where each segmented part forms a group.
  - 11. The data compression method of claim 1, wherein the clustering algorithm includes K-means, K-medoids, Clara, and/or Clarans.
  - 12. The data compression method of claim 1, wherein a center weight selection method of a class is:
    - minimizing the cost function J(w, w₀).
  - 14. The data compression method of claim 3, wherein the second retraining performed on the neural network after clustering and encoding includes:
    - performing retraining on the neural network after clustering, and encoding by using the back propagation algorithm, where the weight that has been set to 0 in the retraining process remains 0 all the time, and only the weight codebook is retrained, the weight dictionary is not retrained.

15. A data compression device, comprising:
- a memory configured to store an operation instruction; and
  
  a processor configured to;
  
  perform coarse-grained pruning on weights of a neural network, which includes;
  
  selecting M weights from the neural network through a sliding window, andset all or part of the M weights to 0 when the M weights meet a preset condition, where the M is an positive integer greater than 0;
  
  performing a first retraining on the neural network, where the weight which has been set to 0 in the retraining process remains 0; and
  
  quantize the weights of the neural network, wherein the processor is further configured to;
  
  group the weights of the neural network;
  
  perform a clustering operation on each group of weights by using a clustering algorithm,compute a center weight of each class, andreplace all the weights in each class by the center weights.
- View Dependent Claims (13, 16, 17, 18, 19)
- - 13. The data compression method of claim 16, wherein the cost function meets a condition:
  - 16. The data compression device of claim 15, wherein, after quantizing the weight of the neural network, the processor is configured to encode the center weight to obtain a weight codebook and a weight dictionary.
  - 17. The data compression device of claim 16, the processor is further configured to perform a second retraining on the neural network, wherein, only the weight codebook is trained during the second retraining of the neural network, and the weight dictionary remains unchanged.
  - 18. The data compression device of claim 15, wherein the preset condition is:
    - an information quantity of the M weights is less than a first given threshold, where the information quantity of the M weights is an arithmetic mean of an absolute value of the M weights, a geometric mean of the absolute value of the M weights, or a maximum value of the M weights;
      
      the first given threshold is a first threshold, a second threshold, or a third threshold; and
      
      the information quantity of the M weights being less than the first given threshold includes;
      
      the arithmetic mean of the absolute value of the M weights being less than the first threshold, or the geometric mean of the absolute value of the M weights being less than the second threshold, or the maximum value of the M weights being less than the third threshold.
  - 19. The data compression device of claim 15, wherein the processor is further configured to:
    - repeat selecting M weights from the neural network through the sliding window,set all or part of the M weights to 0 when the M weights meet the preset condition; and
      
      perform the first retraining on the neural network until no weight can be set to 0 without losing a preset precision.

20. An electronic device, comprising:
- a data compression device that includes;
  
  a memory configured to store an operation instruction; and
  
  a processor configured to;
  
  perform coarse-grained pruning on weights of a neural network, which includes;
  
  selecting M weights from the neural network through a sliding window, andset all or part of the M weights to 0 when the M weights meet a preset condition, where the M is an positive integer greater than 0;
  
  performing a first retraining on the neural network, where the weight which has been set to 0 in the retraining process remains 0; and
  
  quantize the weights of the neural network, wherein the processor is further configured to;
  
  group the weights of the neural network;
  
  perform a clustering operation on each group of weights by using a clustering algorithm,compute a center weight of each class, andreplace all the weights in each class by the center weights.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Shanghai Cambricon Information Technology Co., Ltd.
Original Assignee
Shanghai Cambricon Information Technology Co., Ltd.
Inventors
Du, Zidong, Zhou, Xuda, Wang, Zai, Chen, Tianshi

Application Number

US16/699,049
Publication Number

US 20200134460A1
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 1/3296   by lowering the supply or o...

G06F 12/0848   Partitioned cache, e.g. sep...

G06F 12/0875   with dedicated cache, e.g. ...

G06F 13/16   for access to memory bus G0...

G06F 16/285   Clustering or classification

G06F 2212/1008   Correctness of operation, e...

G06F 2212/1032   Reliability improvement, da...

G06F 2212/1041   Resource optimization

G06F 2212/452   Instruction code

G06F 2212/454   Vector or matrix data

G06F 2213/0026   PCI express

G06F 9/3877   using a slave processor, e....

G06N 3/04   Architecture, e.g. intercon...

G06N 3/044   Recurrent networks, e.g. Ho...

G06N 3/045   Combinations of networks

G06N 3/048   Activation functions

G06N 3/063   using electronic means

G06N 3/082   modifying the architecture,...

G06N 3/084   Backpropagation, e.g. using...

Y02D 10/00   Energy efficient computing,...

PROCESSING METHOD AND ACCELERATING DEVICE

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

1 Citation

20 Claims

Specification

Solutions

Use Cases

Quick Links

PROCESSING METHOD AND ACCELERATING DEVICE

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

1 Citation

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links