Training method and apparatus for convolutional neural network model

US 9,977,997 B2
Filed: 04/12/2017
Issued: 05/22/2018
Est. Priority Date: 04/02/2015
Status: Active Grant

First Claim

Patent Images

1. A method for training a Convolutional Neural Network (CNN) model, comprising:

acquiring, by a server, initial model parameters of a CNN model to be trained, the initial model parameters comprising initial convolution kernels and initial bias matrixes of convolution layers of respective levels, and an initial weight matrix and an initial bias vector of a fully connected layer;

acquiring a plurality of training images;

on the convolution layer of each level, performing, by the server, convolution operation and maximal pooling operation on each of the training images to obtain a first feature image of each of the training images on the convolution layer of each level by using the initial convolution kernel and initial bias matrix of the convolution layer of each level;

performing, by the server, horizontal pooling operation on the first feature image of each of the training images on the convolution layer of at least one of the levels to obtain a second feature image of each of the training images on the convolution layer of each level;

determining, by the server, a feature vector of each of the training images according to the second feature image of each of the training images on the convolution layer of each level;

processing, by the server, each feature vector to obtain a classification probability vector of each of the training images according to the initial weight matrixes and the initial bias vectors;

calculating, by the server, a classification error according to the classification probability vector and initial classification of each of the training images;

regulating, by the server, the model parameters of the CNN model to be trained on the basis of the classification errors;

on the basis of the regulated model parameters and the plurality of training images, continuing, by the server, the process of regulating the model parameters, until the number of iterations reaches a preset number; and

determining, by the server, model parameters obtained when the number of iterations reaches the preset number as the model parameters of the trained CNN model.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed are a training method and apparatus for a CNN model, which belong to the field of image recognition. The method comprises: performing a convolution operation, maximal pooling operation and horizontal pooling operation on training images, respectively, to obtain second feature images; determining feature vectors according to the second feature images; processing the feature vectors to obtain category probability vectors; according to the category probability vectors and an initial category, calculating a category error; based on the category error, adjusting model parameters; based on the adjusted model parameters, continuing the model parameters adjusting process, and using the model parameters when the number of iteration times reaches a pre-set number of times as the model parameters for the well-trained CNN model. After the convolution operation and maximal pooling operation on the training images on each level of convolution layer, a horizontal pooling operation is performed. Since the horizontal pooling operation can extract feature images identifying image horizontal direction features from the feature images, such that the well-trained CNN model can recognize an image of any size, thus expanding the applicable range of the well-trained CNN model in image recognition.

Citations

21 Claims

1. A method for training a Convolutional Neural Network (CNN) model, comprising:
- acquiring, by a server, initial model parameters of a CNN model to be trained, the initial model parameters comprising initial convolution kernels and initial bias matrixes of convolution layers of respective levels, and an initial weight matrix and an initial bias vector of a fully connected layer;
  
  acquiring a plurality of training images;
  
  on the convolution layer of each level, performing, by the server, convolution operation and maximal pooling operation on each of the training images to obtain a first feature image of each of the training images on the convolution layer of each level by using the initial convolution kernel and initial bias matrix of the convolution layer of each level;
  
  performing, by the server, horizontal pooling operation on the first feature image of each of the training images on the convolution layer of at least one of the levels to obtain a second feature image of each of the training images on the convolution layer of each level;
  
  determining, by the server, a feature vector of each of the training images according to the second feature image of each of the training images on the convolution layer of each level;
  
  processing, by the server, each feature vector to obtain a classification probability vector of each of the training images according to the initial weight matrixes and the initial bias vectors;
  
  calculating, by the server, a classification error according to the classification probability vector and initial classification of each of the training images;
  
  regulating, by the server, the model parameters of the CNN model to be trained on the basis of the classification errors;
  
  on the basis of the regulated model parameters and the plurality of training images, continuing, by the server, the process of regulating the model parameters, until the number of iterations reaches a preset number; and
  
  determining, by the server, model parameters obtained when the number of iterations reaches the preset number as the model parameters of the trained CNN model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method according to claim 1, wherein acquiring the plurality of training images comprises:
    - acquiring, by the server, a plurality of initial training images;
      
      for each of the initial training images, keeping, by the server, a width-height ratio of the initial training image, and processing, by the server, the initial training image to obtain a first image with a specified height; and
      
      processing, by the server, the first image to obtain a second image with a specified width, and determining, by the server, the image with the specified height and the specified width as the training image corresponding to the initial training image.
  - 3. The method according to claim 1, wherein acquiring, by the server, the plurality of training images comprises:
    - acquiring, by the server, a plurality of initial training images; and
      
      for each of the initial training images, keeping, by the server, a width-height ratio of the initial training image, processing, by the server, the initial training image to obtain an image with a specified height, and determining, by the server, a width corresponding to the specified height as width of the initial training image.
  - 4. The method according to claim 2, wherein processing, by the server, the first image to obtain the second image with the specified width comprises:
    - when the width of the first image is smaller than the specified width, uniformly filling, by the server, left and right sides of the first image with pixels having a specified gray-scale value, and obtaining, by the server, the second image when the width of the first image reaches the specified width; and
      
      when the width of the first image is larger than the specified width, uniformly cropping, by the server, pixels on the left and right sides of the first image, and obtaining, by the server, the second image when the width of the first image reaches the specified width.
  - 5. The method according to claim 1, wherein performing, by the server, the convolution operation and the maximal pooling operation on each of the training images to obtain the first feature image of each of the training images on the convolution layer of each level by using the initial convolution kernel and initial bias matrix on the convolution layer of each level comprises:
    - for each of the training images, inputting, by the server, the first feature image on the convolution layer of a previous level to a current convolution layer, and performing, by the server, the convolution operation on the first feature image on the convolution layer of the previous level to obtain a convolutional image on the current convolution layer by using the initial convolution kernel and initial bias matrix of the current convolution layer, wherein the first feature image on the convolution layer of the previous level is the training image if the current convolution layer is the convolution layer of the first level; and
      
      after the maximal pooling operation is performed on the convolutional image on the current convolution layer to obtain the first feature image of the training image on the current convolution layer, continuously transmitting, by the server, the first feature image on the current convolution layer to the convolution layer of a next level, and performing, by the server, the convolution operation and the maximal pooling operation on the convolution layer of the next level until the convolution operation and the maximal pooling operation are performed on the convolution layer of a last level to obtain the first feature image on the convolution layer of the last level.
  - 6. The method according to claim 1, wherein performing, by the server, the horizontal pooling operation on the first feature image of each of the training images on the convolution layer of at least one of the levels to obtain the second feature image of each of the training images on the convolution layer of each level comprises:
    - for the first feature image of each training image on the convolution layer of each level, extracting, by the server, a maximum value of elements of each of rows of each of the images in the first feature image on the convolution layer, wherein the first feature image comprises a preset number of images, and the preset number is the same as each of the numbers of the convolution kernels and bias matrixes of the convolution layer;
      
      arranging, by the server, the maximum values extracted from all the rows of each image into a one-dimensional vector according to arrangement of pixels of each image; and
      
      combining, by the server, the one-dimensional vectors of all the images in the first feature image on the convolution layer to obtain the second feature image on the convolution layer.
  - 7. The method according to claim 6, wherein determining, by the server, the feature vector of each of the training images according to the second feature image of each of the training images on the convolution layer of each level comprises:
    - for each of the training images, connecting, by the server, elements of all rows of the second feature image of the training image on the convolution layer of each level head to tail to obtain the feature vector of the training image.
  - 8. The method according to claim 1, wherein calculating, by the server, the classification error according to the classification probability vector and initial classification of each of the training images comprises:
    - acquiring, by the server, the initial classification of each of the training images;
      
      calculating, by the server, the classification error of each of the training images according to the classification probability vector and initial classification of each of the training images by using the following formula;
      
      Loss=−
      
      Lny_label,where Loss represents the classification error of each of the training images, label represents the initial classification of each of the training images, y_irepresents a element of the classification probability vector of each of the training images, and y_labelrepresents a classification probability corresponding to the initial classification; and
      
      calculating, by the server, a mean of the classification errors of all the training images, and determining the mean of the classification errors as a classification error.
  - 9. The method according to claim 1, wherein the training images are images in a natural scene, the images in the natural scene comprise characters in different languages, and the CNN model to be trained is a language recognition classifier.
  - 10. The method according to claim 1, wherein the CNN model to be trained comprises four levels of convolution layers and two fully connected layers, and the convolution layers of the respective levels comprise the same or different numbers of convolution kernels and bias matrixes;
    - performing, by the server, the step that the horizontal pooling operation on the first feature image of each of the training images on the convolution layer of at least one of the levels to obtain the second feature image of each of the training images on the convolution layer of each level comprises;
      
      performing, by the server, the horizontal pooling operation on the first feature image of each of the training images on the convolution layer of a second level, the first feature image of each of the training images on the convolution layer of a third level and the first feature image of each of the training images on the convolution layer of a fourth level to obtain the second feature image of each of the training images on the convolution layer of the second level, the second feature image of each of the training images on the convolution layer of the third level and the second feature image of each of the training images on the convolution layer of the fourth level, respectively; and
      
      determining, by the server, the feature vector of each of the training images according to the second feature image of each of the training images on the convolution layer of each level comprises;
      
      for each of the training images, determining, by the server, the feature vector of the training image according to the second feature image of the training image on the convolution layer of the second level, the second feature image of the training image on the convolution layer of the third level and the second feature image of the training image on the convolution layer of the fourth level.

11. A device for training a Convolutional Neural Network (CNN) model, comprising:
- one or more processors, anda memory connected with the one or more processors, the memory being configured to store instructions executable for the one or more processors,wherein the one or more processors are configured to execute the instructions stored in the memory to;
  
  acquire initial model parameters of a CNN model to be trained, the initial model parameters comprising initial convolution kernels and initial bias matrixes of convolution layers of respective levels, and an initial weight matrix and an initial bias vector of a fully connected layer;
  
  acquire a plurality of training images;
  
  on the convolution layer of each level, perform convolution operation and maximal pooling operation on each of the training images to obtain a first feature image of each of the training images on the convolution layer of each level by using the initial convolution kernel and initial bias matrix of the convolution layer of each level;
  
  perform horizontal pooling operation on the first feature image of each of the training images on the convolution layer of at least one of the levels to obtain a second feature image of each of the training images on the convolution layer of each level;
  
  determine a feature vector of each of the training images according to the second feature image of each of the training images on the convolution layer of each level;
  
  process each feature vector to obtain a classification probability vector of each of the training images according to the initial weight matrixes and the initial bias vectors;
  
  calculate a classification error according to the classification probability vector and initial classification of each of the training images;
  
  regulate the model parameters of the CNN model to be trained on the basis of the classification errors;
  
  continue, on the basis of the regulated model parameters and the plurality of training images, the process of regulating the model parameters until the number of iterations reaches a preset number; and
  
  determine model parameters obtained when the number of iterations reaches the preset number as the model parameters of the trained CNN model.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The device according to claim 11, wherein when acquiring a plurality of training images, the one or more processors are configured to execute the instructions stored in the memory to:
    - acquire a plurality of initial training images;
      
      for each of the initial training images, keep a width-height ratio of the initial training image, and process the initial training image to obtain a first image with a specified height; and
      
      process the first image to obtain a second image with a specified width; and
      
      determine the image with the specified height and the specified width as the training image corresponding to the initial training image.
  - 13. The device according to claim 11, wherein when acquiring the plurality of training images, the one or more processors are configured to execute the instructions stored in the memory to:
    - acquire a plurality of initial training images; and
      
      for each of the initial training images, keep a width-height ratio of the initial training image, process the initial training image to obtain an image with a specified height, and determine a width corresponding to the specified height as the width of the initial training image.
  - 14. The device according to claim 12, wherein when processing the first image to obtain the second image with the specified width, the one or more processors are configured to execute the instructions stored in the memory to:
    - when the width of first image is smaller than the specified width, uniformly fill left and right sides of the first image with pixels having a specified gray-scale value until the width of the first image reaches the specified width; and
      
      when the width of the first image is larger than the specified width, uniformly crop pixels on the left and right sides of the first image until the width of the first image reaches the specified width.
  - 15. The device according to claim 11, wherein when performing the convolution operation and the maximal pooling operation on each of the training images to obtain the first feature image of each of the training images on the convolution layer of each level by using the initial convolution kernel and initial bias matrix on the convolution layer of each level, the one or more processors are configured to execute the instructions stored in the memory to:
    - for each of the training images, input the first feature image on the convolution layer of a previous level to a current convolution layer, and perform the convolution operation on the first feature image on the convolution layer of the previous level to obtain a convolutional image on the current convolution layer by using the initial convolution kernel and initial bias matrix of the current convolution layer, wherein the first feature image on the convolution layer of the previous level is the training image if the current convolution layer is the convolution layer of the first level;
      
      perform the maximal pooling operation on the convolutional image on the current convolution layer to obtain the first feature image of the training image on the current convolution layer; and
      
      continue transmitting the first feature image on the current convolution layer to the convolution layer of a next level, and perform the convolution operation and the maximal pooling operation on the convolution layer of the next level until the convolution operation and the maximal pooling operation are performed on the convolution layer of a last level to obtain the first feature image on the convolution layer of the last level.
  - 16. The device according to claim 11, wherein when performing the horizontal pooling operation on the first feature image of each of the training images on the convolution layer of at least one of the levels to obtain the second feature image of each of the training images on the convolution layer of each level, the one or more processors are configured to execute the instructions stored in the memory to:
    - for the first feature image of each training image on the convolution layer of each level, extract a maximum value of elements of each of rows of each of the images in the first feature image on the convolution layer, wherein the first feature image comprises a preset number of images, and the preset number is the same as the numbers of the convolution kernels and bias matrixes of the convolution layer;
      
      arrange the maximum values extracted from all the rows of each image into a one-dimensional vector according to arrangement of pixels of each image; and
      
      combine the one-dimensional vectors of all the images in the first feature image on the convolution layer to obtain the second feature image on the convolution layer.
  - 17. The device according to claim 16, wherein when determining the feature vector of each of the training images according to the second feature image of each of the training images on the convolution layer of each level, the one or more processors are configured to execute the instructions stored in the memory to:
    - for each of the training images, connect elements of all rows of the second feature image of the training image on the convolution layer of each level head to tail to obtain the feature vector of the training image.
  - 18. The device according to claim 11, wherein when calculating the classification error according to the classification probability vector and initial classification of each of the training images, the one or more processors are configured to execute the instructions stored in the memory to:
    - acquire initial classification of each of the training images;
      
      calculate the classification error of each of the training images according to classification probability vector and the initial classification of each of the training images by using the following formula;
      
      Loss=−
      
      Lny_label,where Loss represents the classification error of each of the training images, label represents the initial classification of each of the training images, y_irepresents a element of the classification probability vector of each of the training images, and y_labelrepresents a classification probability corresponding to the initial classification; and
      
      calculate a mean of the classification errors of all the training images, and determine the mean of the classification errors as a classification error.
  - 19. The device according to claim 11, wherein the training images are images in a natural scene, the images in the natural scene comprise characters in different languages, and the CNN model to be trained is a language recognition classifier.
  - 20. The device according to claim 11 wherein the CNN model to be trained comprises four levels of convolution layers and two fully connected layers, and the convolution layers of respective levels comprise the same or different numbers of convolution kernels and bias matrixes;
    - the one or more processors are configured to execute the instructions stored in the memory to;
      
      perform the horizontal pooling operation on the first feature image of each of the training images on the convolution layer of a second level, the first feature image of each of the training images on the convolution layer of a third level and the first feature image of each of the training images on the convolution layer of a fourth level to obtain the second feature image of each of the training images on the convolution layer of the second level, the second feature image of each of the training images on the convolution layer of the third level and the second feature image of each of the training images on the convolution layer of the fourth level, respectively; and
      
      for each of the training images, determine the feature vector of the training image according to the second feature image of the training image on the convolution layer of the second level, the second feature image of the training image on the convolution layer of the third level and the second feature image of the training image on the convolution layer of the fourth level.

21. A server, comprising:
- one or more processors, anda memory connected with the one or more processors, the memory being configured to store instructions executable for the one or more processors,wherein the one or more processors are configured to execute the instructions stored in the memory to execute a method for training the Convolutional Neural Network (CNN) model, the method comprising;
  
  acquiring initial model parameters of a CNN model to be trained, the initial model parameters comprising initial convolution kernels and initial bias matrixes of convolution layers of respective levels, and an initial weight matrix and an initial bias vector of a fully connected layer;
  
  acquiring a plurality of training images;
  
  on the convolution layer of each level, performing convolution operation and maximal pooling operation on each of the training images to obtain a first feature image of each of the training images on the convolution layer of each level by using the initial convolution kernel and initial bias matrix of the convolution layer of each level;
  
  performing horizontal pooling operation on the first feature image of each of the training images on the convolution layer of at least one of the levels to obtain a second feature image of each of the training images on the convolution layer of each level;
  
  determining a feature vector of each of the training images according to the second feature image of each of the training images on the convolution layer of each level;
  
  processing each feature vector to obtain a classification probability vector of each of the training images according to the initial weight matrixes and the initial bias vectors;
  
  calculating a classification error according to the classification probability vector and initial classification of each of the training images;
  
  regulating the model parameters of the CNN model to be trained on the basis of the classification errors;
  
  on the basis of the regulated model parameters and the plurality of training images, continuing the process of regulating the model parameters, until the number of iterations reaches a preset number; and
  
  determining model parameters obtained when the number of iterations reaches the preset number as the model parameters of the trained CNN model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Tencent Technology Shenzhen Company Limited (Tencent Holdings Limited)
Original Assignee
Tencent Technology Shenzhen Company Limited (Tencent Holdings Limited)
Inventors
Bai, Xiang, Huang, Feiyue, Guo, Xiaowei, Yao, Cong, Shi, Baoguang
Primary Examiner(s)
Wu, Jingge

Application Number

US15/486,102
Publication Number

US 20170220904A1
Time in Patent Office

405 Days
Field of Search
US Class Current
CPC Class Codes

G06F 18/214   Generating training pattern...

G06F 18/24   Classification techniques

G06F 18/2413   based on distances to train...

G06V 10/32   Normalisation of the patter...

G06V 10/764   using classification, e.g. ...

G06V 10/774   Generating sets of training...

G06V 10/82   using neural networks

Training method and apparatus for convolutional neural network model

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Training method and apparatus for convolutional neural network model

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links