DEEP NEURAL NETWORKS TRAINING FOR SPEECH AND PATTERN RECOGNITION

US 20140142929A1
Filed: 11/20/2012
Published: 05/22/2014
Est. Priority Date: 11/20/2012
Status: Active Grant

First Claim

Patent Images

1. A computer-readable medium storing computer-executable instructions that are executable to cause one or more processors to perform acts comprising:

providing a pipelined algorithm to train deep neural networks (DNNs) for performing data analysis based on training data, the DNNs being one of context-dependent DNNs or context-independent DNNs;

partitioning the training data into sample batches of a specific batch size based on rates of data transfers between processors for executing the pipelined algorithm and an execution speed of each processor; and

pipelining an execution of the pipelined algorithm on the DNNs through the processors to train the DNNs using the sample batches.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The use of a pipelined algorithm that performs parallelized computations to train deep neural networks (DNNs) for performing data analysis may reduce training time. The DNNs may be one of context-independent DNNs or context-dependent DNNs. The training may include partitioning training data into sample batches of a specific batch size. The partitioning may be performed based on rates of data transfers between processors that execute the pipelined algorithm, considerations of accuracy and convergence, and the execution speed of each processor. Other techniques for training may include grouping layers of the DNNs for processing on a single processor, distributing a layer of the DNNs to multiple processors for processing, or modifying an execution order of steps in the pipelined algorithm.

Citations

20 Claims

1. A computer-readable medium storing computer-executable instructions that are executable to cause one or more processors to perform acts comprising:
- providing a pipelined algorithm to train deep neural networks (DNNs) for performing data analysis based on training data, the DNNs being one of context-dependent DNNs or context-independent DNNs;
  
  partitioning the training data into sample batches of a specific batch size based on rates of data transfers between processors for executing the pipelined algorithm and an execution speed of each processor; and
  
  pipelining an execution of the pipelined algorithm on the DNNs through the processors to train the DNNs using the sample batches.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The computer-readable medium of claim 1, further comprising grouping at least two consecutive layers of DNNs for processing on a single processor by the pipelined algorithm.
  - 3. The computer-readable medium of claim 1, further comprising distributing a top layer of the DNNs across multiple processors for parallelized processing by the pipelined algorithm through model striping.
  - 4. The computer-readable medium of claim 1, wherein the specific batch size maximizes computation accuracy for reaching convergence and execution efficiency of the pipelined algorithm in training the DNNs.
  - 5. The computer-readable medium of claim 1, wherein the processors include multi-core general-purpose graphics processing units (GPGPUs) that exchange data through a peripheral component interconnect express (PCIe) bus of a computing device.
  - 6. The computer-readable medium of claim 1, wherein the processors include field programmable gate arrays (FPGAs) that exchange data through an internal bus of a computing device.
  - 7. The computer-readable medium of claim 1, wherein the pipelining includes executing a model update prior to an input data forward propagation in a computation iteration of the pipelined algorithm.
  - 8. The computer-readable medium of claim 1, wherein the DNNs include multiple layers, and wherein the pipelining includes streaming output data from a computation at a first processor that processes an upper layer to a second processor that processes a lower layer following a performance of an error back propagation step of a computation iteration, the streaming of the output data occurring at least partially in parallel with one or more of an model update or an input data forward propagation.
  - 9. The computer-readable medium of claim 8, wherein the pipelining further includes streaming additional output data from a computation at the second processor that processes the lower layer to the first processor that processes the upper layer following the input data forward propagation, the streaming of the additional output data occurring at least partially in parallel with a computation of an error for another error back propagation.

10. A computer-implemented method, comprising:
- providing a pipelined algorithm to train deep neural networks (DNNs) for performing data analysis based on training data, the DNNs being one of context-dependent DNNs or context-independent DNNs and including multiple layers;
  
  distributing a top layer of the DNNs across multiple processors through model striping for parallelized processing by the pipelined algorithm; and
  
  pipelining an execution of the pipelined algorithm on the DNNs through a plurality of processors to train the DNNs using the sample batches from the training data.
- View Dependent Claims (11, 12, 13, 14, 15)
- - 11. The computer-implemented method of claim 10, further comprising partitioning the training data into the sample batches having a specific batch size based on rates of data transfers between the processors for executing the pipelined algorithm and an execution speed of each processor.
  - 12. The computer-implemented method of claim 10, further comprising grouping at least two layers of the DNNs for processing on a single processor by the pipelined algorithm.
  - 13. The computer-implemented method of claim 10, wherein the distributing includes distributing the top layer in response to a determination that a ratio of a size of the top layer to a size of another layer or an average size of multiple layers in the DNNs exceeds a ratio threshold.
  - 14. The computer-implemented method of claim 10, wherein the providing includes providing the pipeline algorithm to train a combination of the context-dependent DNNs and hidden Markov models (HMMs) for performing speech recognition.
  - 15. The computer-implemented method of claim 10, wherein the pipelining includes executing a model update prior to an input data forward propagation in a computation iteration of the pipelined algorithm.

16. A system, comprising:
- a plurality of processors;
  
  a memory that includes a plurality of computer-executable components that are executable by the plurality of processors, comprising;
  
  a batch generation component that partitions a training into sample batches of a specific batch size; and
  
  an algorithm execution component that pipelines an execution of an pipelined algorithm through the plurality of processors to train deep neural networks (DNNs) using the sample batches, the execution including executing a model update prior to an input data forward propagation in a computation iteration of the pipelined algorithm, the DNNs being one of context-dependent DNNs or context-independent DNNs.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The system of claim 16, wherein the DNNs include multiple layers, and wherein the execution further includes streaming output data from a computation at a first processor that processes an upper layer to a second processor that processes a lower layer following a performance of an error back propagation of the computation iteration, the streaming of the output data occurring at least partially in parallel with one or more of the model update or the input data forward propagation.
  - 18. The system of claim 17, wherein the execution further includes streaming additional output data from an additional computation at the second processor that processes the lower layer to the first processor that processes the upper layer following the input data forward propagation, the streaming of the additional output data occurring at least partially in parallel with a computation of an error for another error back propagation.
  - 19. The system of claim 16, wherein the batch generation component partitions the training data into the sample batches of the specific batch size based on rates of data transfers between the processors and an execution speed of each processor.
  - 20. The system of claim 16, further comprising a load balance component that at least one of groups multiple layers of the DNNs for processing on a single processor of the plurality of processors by the pipelined algorithm, or distributes a top layer of the DNNs across multiple ones of the plurality of processors through model striping for parallelized processing by the pipelined algorithm.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation), Sony Corporation (Sony Group Corp.)
Original Assignee
Microsoft Corporation
Inventors
Seide, Frank Torsten Bernd, Li, Gang, Yu, Dong, Eversole, Adam C., Chen, Xie

Granted Patent

US 9,477,925 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/202
CPC Class Codes

G06N 3/08 Learning methods

G10L 15/06 Creation of reference templ...

DEEP NEURAL NETWORKS TRAINING FOR SPEECH AND PATTERN RECOGNITION

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

DEEP NEURAL NETWORKS TRAINING FOR SPEECH AND PATTERN RECOGNITION

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links