Mobile speech recognition hardware accelerator

US 9,153,230 B2
Filed: 10/23/2012
Issued: 10/06/2015
Est. Priority Date: 10/23/2012
Status: Active Grant

First Claim

Patent Images

1. A mobile computing device, comprising;

a processor configured to execute a speech recognition application that uses a multi-layered neural network as an acoustic model; and

a hardware accelerator comprising;

circuitry configured to receive matrix data representing one or more frames of an audio signal as input data for a first layer of the neural network;

a multiplier-accumulator (MAC) unit configured to;

multiply the received matrix data representing one or more frames of the audio signal with a weight matrix;

add a bias matrix to the multiplication results; and

accumulate the addition results;

circuitry configured to pass the accumulated results through an activation function to generate an output matrix representing an output of the first layer of the neural network for the frame; and

a data transceiver configured to receive and decode weights and bias terms data, the data transceiver including;

a decompression unit configured to;

decompress compressed weight and bias terms data, anddouble buffer decompressed weights and bias terms data to allow for parallel decompression and MAC unit operations.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for executing a mobile speech recognition software application based on a multi-layer neural network model includes providing to a hardware accelerator in the mobile device to classify one or more frames of an audio signal. The hardware accelerator includes a multiplier-accumulator (MAC) unit to perform matrix multiplication operations involved in computing the neural network output.

50 Citations

View as Search Results

31 Claims

1. A mobile computing device, comprising;
- a processor configured to execute a speech recognition application that uses a multi-layered neural network as an acoustic model; and
  
  a hardware accelerator comprising;
  
  circuitry configured to receive matrix data representing one or more frames of an audio signal as input data for a first layer of the neural network;
  
  a multiplier-accumulator (MAC) unit configured to;
  
  multiply the received matrix data representing one or more frames of the audio signal with a weight matrix;
  
  add a bias matrix to the multiplication results; and
  
  accumulate the addition results;
  
  circuitry configured to pass the accumulated results through an activation function to generate an output matrix representing an output of the first layer of the neural network for the frame; and
  
  a data transceiver configured to receive and decode weights and bias terms data, the data transceiver including;
  
  a decompression unit configured to;
  
  decompress compressed weight and bias terms data, anddouble buffer decompressed weights and bias terms data to allow for parallel decompression and MAC unit operations.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The mobile computing device of claim 1, further comprising, a streaming unit configured to stream weights and bias terms data to the hardware accelerator.
  - 3. The mobile computing device of claim 1, wherein the hardware accelerator includes RAM buffers to store the decoded weights and bias terms.
  - 4. The mobile computing device of claim 1, wherein the data transceiver circuitry is configured to transmit decoded weighing coefficients and bias coefficients data on a first in first out basis via a FIFO register to the MAC unit.
  - 5. The mobile computing device of claim 1, wherein the activation function is one of a sigmoid function, a signum function, a threshold function, a piecewise-linear function, a step function, and a tanh function.
  - 6. The mobile computing device of claim 1, wherein the circuitry configured to pass the accumulated results through an activation function is configured to apply the activation function in a piecewise quadratic approximation.
  - 7. The mobile computing device of claim 1, wherein the hardware accelerator is configured to feed the output of the first layer of the neural network as input data for a next layer of the neural network.
  - 8. The mobile computing device of claim 1, wherein the hardware accelerator is further configured to raise an interrupt after computing all layers of the neural network.
  - 9. The mobile computing device of claim 1, wherein the hardware accelerator is implemented as an Application Specific Integrated Circuit (ASIC) core.
  - 10. The mobile computing device of claim 1, wherein the hardware accelerator is implemented as a field-programmable gate array (FPGA).

11. A method for executing a speech recognition software application on a mobile device, the method comprising;
- utilizing a hardware accelerator in the mobile device to perform neural network calculations to classify an audio signal, wherein utilizing the hardware accelerator includes;
  
  sending matrix data representing one or more frames of an audio signal as input data for a first layer of a neural network to the hardware accelerator;
  
  using a multiplier-accumulator (MAC) unit in the hardware accelerator to;
  
  multiply the received matrix data representing one or more frames of the audio signal with a weight matrix;
  
  add a bias matrix to the multiplication results; and
  
  accumulate the addition results; and
  
  using circuitry in the hardware accelerator to pass the accumulated results through an activation function to generate an output matrix representing an output of the first layer of the neural network;
  
  receive and decode weights and bias terms data;
  
  decompress compressed weight and bias terms data; and
  
  double buffer decompressed weights and bias terms data to allow for parallel decompression and MAC unit operations.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The method of claim 11 further comprising:
    - streaming compressed and/or uncompressed weight and bias terms data to the hardware accelerator.
  - 13. The method of claim 11 further comprising:
    - storing the weights and bias terms data in RAM buffers.
  - 14. The method of claim 11 further comprising:
    - transmitting weighing coefficients and bias coefficients data on a first in first out basis to the MAC unit.
  - 15. The method of claim 11, wherein the activation function is one of a sigmoid function, a signum function, a threshold function, a piecewise-linear function, a step function, and a tanh function.
  - 16. The method of claim 11 further comprising:
    - applying the activation function in a piecewise quadratic approximation.
  - 17. The method of claim 11 further comprising:
    - feeding the output of the first layer of the neural network as input data for a next layer of the neural network.
  - 18. The method of claim 11 further comprising:
    - raising an interrupt after computing all layers of the neural network through the hardware accelerator.
  - 19. The method of claim 11 further comprising:
    - implementing the hardware accelerator as an Application Specific Integrated Circuit (ASIC) core.
  - 20. The method of claim 11 further comprising:
    - implementing the hardware accelerator as a field-programmable gate array (FPGA).

21. A hardware accelerator configured to compute a multi-layered neural network of a mobile speech recognition application, the hardware accelerator comprising:
- circuitry configured to receive matrix data representing one or more frames of an audio signal as input data for a first layer of the neural network;
  
  a multiplier-accumulator (MAC) unit comprising;
  
  circuitry configured to multiply the matrix data representing one or more frames of the audio signal with a weight matrix;
  
  circuitry configured to add a bias matrix to the multiplication results;
  
  circuitry configured to accumulate the addition results;
  
  circuitry to pass the accumulated results through an activation function to generate an output matrix representing an output of the first layer of the neural network for the frame; and
  
  a data transceiver configured to receive and decode weights and bias terms data, the data transceiver including;
  
  a decompression unit configured to;
  
  decompress compressed weight and bias terms data, anddouble buffer decompressed weights and bias terms data to allow for parallel decompression and MAC unit operations.
- View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
- - 22. The hardware accelerator of claim 21 further configured to use the output of the first layer of the neural network as input to the MAC unit for computing the next layer of the neural network.
  - 23. The hardware accelerator of claim 21, further comprising:
    - memory buffers to store weights and bias terms data.
  - 24. The hardware accelerator of claim 21, wherein the decompression unit is configured to decompress a stream of compressed weight and bias terms to extract weight matrix coefficients row-by-row or column-by-column.
  - 25. The hardware accelerator of claim 24, wherein the decompression unit is further configured to transmit the extracted row-by-row or column-by-column weight matrix coefficients to the MAC unit via a FIFO register.
  - 26. The hardware accelerator of claim 21, wherein the circuitry to pass the accumulated results through an activation function is configured to use a piecewise quadratic approximation to the activation function.
  - 27. The hardware accelerator of claim 21, wherein the activation function is one of a sigmoid function, a signum function, a threshold function, a piecewise-linear function, a step function, and a tanh function.
  - 28. The hardware accelerator of claim 21 implemented as an Application Specific Integrated Circuit (ASIC) core.
  - 29. The hardware accelerator of claim 21 implemented as a field-programmable gate array (FPGA).
  - 30. The hardware accelerator of claim 21 further configured to raise an interrupt after computing all layers of the neural network.
  - 31. The hardware accelerator of claim 21 further comprising a plurality of MAC units configured to operate in parallel to perform at least 1 Giga multiplications per second.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Maaninen, Juha-Pekka
Primary Examiner(s)
ALBERTALLI, BRIAN LOUIS

Application Number

US13/658,654
Publication Number

US 20150199963A1
Time in Patent Office

1,078 Days
Field of Search
US Class Current

1/1
CPC Class Codes

G10L 15/16 using artificial neural net...

G10L 15/28 Constructional details of s...

Mobile speech recognition hardware accelerator

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

50 Citations

31 Claims

Specification

Solutions

Use Cases

Quick Links

Mobile speech recognition hardware accelerator

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

50 Citations

31 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links