METHOD AND DEVICE FOR PARALLEL PROCESSING IN MODEL TRAINING

US 20150019214A1
Filed: 12/16/2013
Published: 01/15/2015
Est. Priority Date: 07/10/2013
Status: Active Grant

First Claim

Patent Images

1. A method of training a Deep Neural Network (DNN) model, comprising:

at a device comprising one or more processors and memory;

establishing an initial DNN model;

dividing a training data corpus into a plurality of disjoint data subsets;

for each of the plurality of disjoint data subsets, providing the data subset to a respective training processing unit of a plurality of training processing units operating in parallel, wherein the respective training processing unit applies a Stochastic Gradient Descent (SGD) process to update the initial DNN model to generate a respective DNN sub-model based on the data subset; and

merging the respective DNN sub-models generated by the plurality of training processing units to obtain an intermediate DNN model, wherein the intermediate DNN model is established as either the initial DNN model for a next training iteration or a final DNN model in accordance with a preset convergence condition.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and a device for training a DNN model includes: at a device including one or more processors and memory: establishing an initial DNN model; dividing a training data corpus into a plurality of disjoint data subsets; for each of the plurality of disjoint data subsets, providing the data subset to a respective training processing unit of a plurality of training processing units operating in parallel, wherein the respective training processing unit applies a Stochastic Gradient Descent (SGD) process to update the initial DNN model to generate a respective DNN sub-model based on the data subset; and merging the respective DNN sub-models generated by the plurality of training processing units to obtain an intermediate DNN model, wherein the intermediate DNN model is established as either the initial DNN model for a next training iteration or a final DNN model in accordance with a preset convergence condition.

73 Citations

View as Search Results

20 Claims

1. A method of training a Deep Neural Network (DNN) model, comprising:
- at a device comprising one or more processors and memory;
  
  establishing an initial DNN model;
  
  dividing a training data corpus into a plurality of disjoint data subsets;
  
  for each of the plurality of disjoint data subsets, providing the data subset to a respective training processing unit of a plurality of training processing units operating in parallel, wherein the respective training processing unit applies a Stochastic Gradient Descent (SGD) process to update the initial DNN model to generate a respective DNN sub-model based on the data subset; and
  
  merging the respective DNN sub-models generated by the plurality of training processing units to obtain an intermediate DNN model, wherein the intermediate DNN model is established as either the initial DNN model for a next training iteration or a final DNN model in accordance with a preset convergence condition.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the initial and final DNN models are acoustic models for speech recognition and the training data corpus comprises a plurality of randomized speech files.
  - 3. The method of claim 1, wherein merging the respective DNN sub-models generated by the plurality of training processing units further comprises:
    - using a respective shared merging weight for all layers of each DNN sub-model during the merging.
  - 4. The method of claim 1, wherein merging the respective DNN sub-models generated by the plurality of training processing units further comprises:
    - using a respective merging weight for each layer of each DNN sub-model during the merging.
  - 5. This method of claim 1, further comprising:
    - identifying a plurality of decoding processing units operating in parallel, each decoding processing units utilizing a respective final DNN model;
      
      providing a same test sample to each of the plurality of decoding processing units operating in parallel, wherein each decoding processing unit generates a respective posterior probability sequence for the same test sample based on the respective final DNN model of the decoding processing unit; and
      
      merging the respective posterior probability sequences generated by the plurality of decoding processing units to obtain a recognition result for the same test sample.
  - 6. The method of claim 1, wherein merging respective posterior probability sequences generated by the plurality of decoding processing units further comprises:
    - using a respective shared merging weight for all phoneme binding states of each respective posterior probability sequence during the merging of the respective posterior probability sequences generated by the plurality of decoding processing units.
  - 7. The method of claim 1, wherein merging respective posterior probability sequences generated by the plurality of decoding processing units further comprises:
    - using a respective merging weight for each phoneme binding state of each DNN sub-model during the merging of the respective posterior probability sequences generated by the plurality of decoding processing units.

8. A system for training a Deep Neural Network (DNN) model, comprising:
- one or more processors; and
  
  memory having instructions stored thereon, the instructions, when executed by the one or more processors, cause the processors to perform operations comprising;
  
  establishing an initial DNN model;
  
  dividing a training data corpus into a plurality of disjoint data subsets;
  
  for each of the plurality of disjoint data subsets, providing the data subset to a respective training processing unit of a plurality of training processing units operating in parallel, wherein the respective training processing unit applies a Stochastic Gradient Descent (SGD) process to update the initial DNN model to generate a respective DNN sub-model based on the data subset; and
  
  merging the respective DNN sub-models generated by the plurality of training processing units to obtain an intermediate DNN model, wherein the intermediate DNN model is established as either the initial DNN model for a next training iteration or a final DNN model in accordance with a preset convergence condition.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein the initial and final DNN models are acoustic models for speech recognition and the training data corpus includes a plurality of randomized speech files.
  - 10. The system of claim 8, wherein merging the respective DNN sub-models generated by the plurality of training processing units further comprises:
    - using a respective shared merging weight for all layers of each DNN sub-model during the merging.
  - 11. The system of claim 8, wherein merging the respective DNN sub-models generated by the plurality of training processing units further comprises:
    - using a respective merging weight for each layer of each DNN sub-model during the merging.
  - 12. The system of claim 8, wherein the operations further comprise:
    - identifying a plurality of decoding processing units operating in parallel, each decoding processing units utilizing a respective final DNN model;
      
      providing a same test sample to each of the plurality of decoding processing units operating in parallel, wherein each decoding processing unit generates a respective posterior probability sequence for the same test sample based on the respective final DNN model of the decoding processing unit; and
      
      merging the respective posterior probability sequences generated by the plurality of decoding processing units to obtain a recognition result for the same test sample.
  - 13. The system of claim 8, wherein merging respective posterior probability sequences generated by the plurality of decoding processing units further comprises:
    - using a respective shared merging weight for all phoneme binding states of each respective posterior probability sequence during the merging of the respective posterior probability sequences generated by the plurality of decoding processing units.
  - 14. The system of claim 8, wherein merging respective posterior probability sequences generated by the plurality of decoding processing units further comprises:
    - using a respective merging weight for each phoneme binding state of each DNN sub-model during the merging of the respective posterior probability sequences generated by the plurality of decoding processing units.

15. A non-transitory computer-readable storage medium having instructions stored thereon, the instructions, when executed by one or more processors, cause the processors to perform operations comprising:
- establishing an initial DNN model;
  
  dividing a training data corpus into a plurality of disjoint data subsets;
  
  for each of the plurality of disjoint data subsets, providing the data subset to a respective training processing unit of a plurality of training processing units operating in parallel, wherein the respective training processing unit applies a Stochastic Gradient Descent (SGD) process to update the initial DNN model to generate a respective DNN sub-model based on the data subset; and
  
  merging the respective DNN sub-models generated by the plurality of training processing units to obtain an intermediate DNN model, wherein the intermediate DNN model is established as either the initial DNN model for a next training iteration or a final DNN model in accordance with a preset convergence condition.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The computer-readable storage medium of claim 15, wherein merging the respective DNN sub-models generated by the plurality of training processing units further comprises:
    - using a respective shared merging weight for all layers of each DNN sub-model during the merging.
  - 17. The computer-readable storage medium of claim 15, wherein merging the respective DNN sub-models generated by the plurality of training processing units further comprises:
    - using a respective merging weight for each layer of each DNN sub-model during the merging.
  - 18. The computer-readable storage medium of claim 15, wherein the operations further comprise:
    - identifying a plurality of decoding processing units operating in parallel, each decoding processing units utilizing a respective final DNN model;
      
      providing a same test sample to each of the plurality of decoding processing units operating in parallel, wherein each decoding processing unit generates a respective posterior probability sequence for the same test sample based on the respective final DNN model of the decoding processing unit; and
      
      merging the respective posterior probability sequences generated by the plurality of decoding processing units to obtain a recognition result for the same test sample.
  - 19. The computer-readable storage medium of claim 15, wherein merging respective posterior probability sequences generated by the plurality of decoding processing units further comprises:
    - using a respective shared merging weight for all phoneme binding states of each respective posterior probability sequence during the merging of the respective posterior probability sequences generated by the plurality of decoding processing units.
  - 20. The computer-readable storage medium of claim 15, wherein merging respective posterior probability sequences generated by the plurality of decoding processing units further comprises:
    - using a respective merging weight for each phoneme binding state of each DNN sub-model during the merging of the respective posterior probability sequences generated by the plurality of decoding processing units.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Tencent Technology Company Limited (Tencent Holdings Limited)
Original Assignee
Tencent Technology Company Limited (Tencent Holdings Limited)
Inventors
WANG, Eryu, LU, Li, ZHANG, Xiang, LIU, Haibo, RAO, Feng, LI, Lou, YUE, Shuai, CHEN, Bo

Granted Patent

US 9,508,347 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/232
CPC Class Codes

G06N 3/02   Neural networks

G10L 15/063   Training

G10L 15/16   using artificial neural net...

G10L 15/34   Adaptation of a single reco...

METHOD AND DEVICE FOR PARALLEL PROCESSING IN MODEL TRAINING

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

73 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

METHOD AND DEVICE FOR PARALLEL PROCESSING IN MODEL TRAINING

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

73 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links