Deep neural networks training for speech and pattern recognition

US 9,477,925 B2
Filed: 11/20/2012
Issued: 10/25/2016
Est. Priority Date: 11/20/2012
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

one or more processors; and

one or more computer storage media storing computer-executable instructions that are executable to cause the one or more processors to perform acts comprising;

providing a pipelined algorithm to train deep neural networks (DNNs) for performing data analysis based on training data, the DNNs being one of context-dependent DNNs or context-independent DNNs;

partitioning the training data into sample batches of a specific batch size based on rates of data transfers between the one or more processors for executing the pipelined algorithm and an execution speed of each of the one or more processors; and

pipelining an execution of the pipelined algorithm on the DNNs through the one or more processors to train the DNNs using the sample batches.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The use of a pipelined algorithm that performs parallelized computations to train deep neural networks (DNNs) for performing data analysis may reduce training time. The DNNs may be one of context-independent DNNs or context-dependent DNNs. The training may include partitioning training data into sample batches of a specific batch size. The partitioning may be performed based on rates of data transfers between processors that execute the pipelined algorithm, considerations of accuracy and convergence, and the execution speed of each processor. Other techniques for training may include grouping layers of the DNNs for processing on a single processor, distributing a layer of the DNNs to multiple processors for processing, or modifying an execution order of steps in the pipelined algorithm.

68 Citations

View as Search Results

20 Claims

1. A system comprising:
- one or more processors; and
  
  one or more computer storage media storing computer-executable instructions that are executable to cause the one or more processors to perform acts comprising;
  
  providing a pipelined algorithm to train deep neural networks (DNNs) for performing data analysis based on training data, the DNNs being one of context-dependent DNNs or context-independent DNNs;
  
  partitioning the training data into sample batches of a specific batch size based on rates of data transfers between the one or more processors for executing the pipelined algorithm and an execution speed of each of the one or more processors; and
  
  pipelining an execution of the pipelined algorithm on the DNNs through the one or more processors to train the DNNs using the sample batches.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The system claim 1, further comprising grouping at least two consecutive layers of DNNs for processing on a single processor of the one or more processors by the pipelined algorithm.
  - 3. The system of claim 1, further comprising distributing a top layer of the DNNs across multiple processors of the one or more processors for parallelized processing by the pipelined algorithm through model striping.
  - 4. The system of claim 1, wherein the specific batch size maximizes computation accuracy for reaching convergence and execution efficiency of the pipelined algorithm in training the DNNs.
  - 5. The system of claim 1, wherein the one or more processors include multi-core general-purpose graphics processing units (GPGPUs) that exchange data through a peripheral component interconnect express (PCIe) bus of a computing device.
  - 6. The system of claim 1, wherein the one or more processors include field programmable gate arrays (FPGAs) that exchange data through an internal bus of a computing device.
  - 7. The system of claim 1, wherein the pipelining includes executing a model update prior to an input data forward propagation in a computation iteration of the pipelined algorithm.
  - 8. The system of claim 1, wherein the DNNs include multiple layers, and wherein the pipelining includes streaming output data from a computation at a first processor of the one or more processors that processes an upper layer to a second processor of the one or more processors that processes a lower layer following a performance of an error back propagation step of a computation iteration, the streaming of the output data occurring at least partially in parallel with one or more of an model update or an input data forward propagation.
  - 9. The system of claim 8, wherein the pipelining further includes streaming additional output data from a computation at the second processor of the one or more processors that processes the lower layer to the first processor of the one or more processors that processes the upper layer following the input data forward propagation, the streaming of the additional output data occurring at least partially in parallel with a computation of an error for another error back propagation.

10. A computer-implemented method, comprising:
- providing a pipelined algorithm to train deep neural networks (DNNs) for performing data analysis based on training data, the DNNs being one of context-dependent DNNs or context-independent DNNs and including multiple layers;
  
  determining that a ratio between a size of a top layer and a size of one or more of the multiple layers exceeds a predetermined threshold;
  
  based at least in part on the determining, distributing the top layer of the DNNs across multiple processors through model striping for parallelized processing by the pipelined algorithm; and
  
  pipelining an execution of the pipelined algorithm on the DNNs through the multiple processors to train the DNNs using sample batches of the training data.
- View Dependent Claims (11, 12, 13, 14, 15)
- - 11. The computer-implemented method of claim 10, further comprising partitioning the training data into the sample batches having a specific batch size based on rates of data transfers between the multiple processors for executing the pipelined algorithm and an execution speed of each of the multiple processors.
  - 12. The computer-implemented method of claim 10, further comprising grouping at least two layers of the DNNs for processing on a single processor of the multiple processors by the pipelined algorithm.
  - 13. The computer-implemented method of claim 10, wherein the distributing includes distributing the top layer in response to a determination that a ratio of a size of the top layer to a size of another layer or an average size of multiple layers in the DNNs exceeds a ratio threshold.
  - 14. The computer-implemented method of claim 10, wherein the providing includes providing the pipeline algorithm to train a combination of the context-dependent DNNs and hidden Markov models (HMMs) for performing speech recognition.
  - 15. The computer-implemented method of claim 10, wherein the pipelining includes executing a model update prior to an input data forward propagation in a computation iteration of the pipelined algorithm.

16. A system, comprising:
- a plurality of processors;
  
  a memory that includes a plurality of computer-executable components that are executable by the plurality of processors, comprising;
  
  a batch generation component that partitions training data into sample batches of a specific batch size; and
  
  an algorithm execution component that pipelines an execution of a pipelined algorithm through the plurality of processors to train deep neural networks (DNNs) using the sample batches, the execution including executing a model update prior to an input data forward propagation in a computation iteration of the pipelined algorithm, the DNNs being one of context-dependent DNNs or context-independent DNNs, wherein the algorithm execution component trains the DNNs based at least in part on performing gradient descent techniques, wherein the DNNs include multiple layers, and wherein the execution further includes streaming output data from a computation at a first processor of the plurality of processors that processes an upper layer to a second processor of the plurality of processors that processes a lower layer following a performance of an error back propagation of the computation iteration, the streaming of the output data occurring at least partially in parallel with one or more of the model update or the input data forward propagation.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The system of claim 16, wherein the execution further includes streaming additional output data from an additional computation at the second processor that processes the lower layer to the first processor that processes the upper layer following the input data forward propagation, the streaming of the additional output data occurring at least partially in parallel with a computation of an error for another error back propagation.
  - 18. The system of claim 16, wherein the batch generation component partitions the training data into the sample batches of the specific batch size based on rates of data transfers between the plurality of processors and an execution speed of each of the plurality of processors.
  - 19. The system of claim 16, further comprising a load balance component that at least one of groups multiple layers of the DNNs for processing on a single processor of the plurality of processors by the pipelined algorithm, or distributes a top layer of the DNNs across multiple ones of the plurality of processors through model striping for parallelized processing by the pipelined algorithm.
  - 20. The system of claim 16, further comprising a load balance component that groups a first layer and a second layer of the multiple layers of the DNNs based at least in part on a number of the plurality of processors.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation), Sony Corporation (Sony Group Corp.)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Seide, Frank Torsten Bernd, Li, Gang, Yu, Dong, Eversole, Adam C., Chen, Xie
Primary Examiner(s)
VO, HUYEN X

Application Number

US13/682,372
Publication Number

US 20140142929A1
Time in Patent Office

1,435 Days
Field of Search

704 1- 10, 704/251, 704/255, 704/257, 704/242, 704/256, 704/16, 704/41, 704/232, 704/202, 706/31
US Class Current

1/1
CPC Class Codes

G06N 3/08 Learning methods

G10L 15/06 Creation of reference templ...

Deep neural networks training for speech and pattern recognition

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

68 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Deep neural networks training for speech and pattern recognition

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

68 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links