CONVOLUTIONAL, LONG SHORT-TERM MEMORY, FULLY CONNECTED DEEP NEURAL NETWORKS

US 20200135227A1
Filed: 12/31/2019
Published: 04/30/2020
Est. Priority Date: 10/03/2014
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving input features, wherein the input features include respective segment features for each of a plurality of segments; and

processing the input features using a model, wherein the processing comprises;

for each of the segments;

generating first features for the segment by processing the segment features for the segment using one or more convolutional neural network (CNN) layers, wherein the convolutional neural network (CNN) layers perform spatial modeling on the input feature;

generating second features for the segment by processing the first features using one or more long short-term memory network (LSTM) layers to perform temporal modeling over the first features; and

determining an output feature based on at least the second features for the plurality of segments.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying the language of a spoken utterance. One of the methods includes receiving input features of an utterance; and processing the input features using an acoustic model that comprises one or more convolutional neural network (CNN) layers, one or more long short-term memory network (LSTM) layers, and one or more fully connected neural network layers to generate a transcription for the utterance.

4 Citations

View as Search Results

13 Claims

1. A method comprising:
- receiving input features, wherein the input features include respective segment features for each of a plurality of segments; and
  
  processing the input features using a model, wherein the processing comprises;
  
  for each of the segments;
  
  generating first features for the segment by processing the segment features for the segment using one or more convolutional neural network (CNN) layers, wherein the convolutional neural network (CNN) layers perform spatial modeling on the input feature;
  
  generating second features for the segment by processing the first features using one or more long short-term memory network (LSTM) layers to perform temporal modeling over the first features; and
  
  determining an output feature based on at least the second features for the plurality of segments.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1,wherein processing the first features using the one or more LSTM layers to generate the second features comprises:
    - processing the first features using a linear layer to generate reduced features having a reduced dimension from a dimension of the first features; and
      
      processing the reduced features using the one or more LSTM layers to generate the second features.
  - 3. The method of claim 1, further comprising:
    - generating, based on the input features, short-term features having a first number of contextual frames,wherein features generated using the one or more CNN layers include long-term features having a second number of contextual frames that are more than the first number of contextual frames of the short-term features; and
      
      providing the short-term features and the long-term features as input to the one or more LSTM layers.
  - 4. The method of claim 1, wherein the one or more CNN layers, the one or more LSTM layers, and the one or more fully connected neural network layers have been jointly trained to determine trained values of parameters of the one or more CNN layers, the one or more LSTM layers, and the one or more fully connected neural network layers.
  - 5. The method of claim 1, wherein the input features include log-mel features having multiple dimensions.

6. A system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to perform operations comprising:
- receiving input features, wherein the input features include respective segment features for each of a plurality of segments; and
  
  processing the input features using a model, wherein the processing comprises;
  
  for each of the segments;
  
  generating first features for the segment by processing the segment features for the segment using one or more convolutional neural network (CNN) layers, wherein the convolutional neural network (CNN) layers perform spatial modeling on the input feature;
  
  generating second features for the segment by processing the first features using one or more long short-term memory network (LSTM) layers to perform temporal modeling over the first features; and
  
  determining an output feature based on at least the second features for the plurality of segments.
- View Dependent Claims (7, 8, 9)
- - 7. The system of claim 6,wherein processing the first features using the one or more LSTM layers to generate the second features comprises:
    - processing the first features using a linear layer to generate reduced features having a reduced dimension from a dimension of the first features; and
      
      processing the reduced features using the one or more LSTM layers to generate the second features.
  - 8. The system of claim 6, wherein the operations further comprise:
    - generating, based on the input features, short-term features having a first number of contextual frames,wherein features generated using the one or more CNN layers include long-term features having a second number of contextual frames that are more than the first number of contextual frames of the short-term features; and
      
      providing the short-term features and the long-term features as input to the one or more LSTM layers.
  - 9. The system of claim 6, wherein the one or more CNN layers, the one or more LSTM layers, and the one or more fully connected neural network layers have been jointly trained to determine trained values of parameters of the one or more CNN layers, the one or more LSTM layers, and the one or more fully connected neural network layers.

10. A computer program product encoded on one or more non-transitory computer storage media, the computer program product comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
- receiving input features, wherein the input features include respective segment features for each of a plurality of segments; and
  
  processing the input features using a model, wherein the processing comprises;
  
  for each of the segments;
  
  generating first features for the segment by processing the segment features for the segment using one or more convolutional neural network (CNN) layers, wherein the convolutional neural network (CNN) layers perform spatial modeling on the input feature;
  
  generating second features for the segment by processing the first features using one or more long short-term memory network (LSTM) layers to perform temporal modeling over the first features; and
  
  determining an output feature based on at least the second features for the plurality of segments.
- View Dependent Claims (11, 12, 13)
- - 11. The computer program product of claim 10,wherein processing the first features using the one or more LSTM layers to generate the second features comprises:
    - processing the first features using a linear layer to generate reduced features having a reduced dimension from a dimension of the first features; and
      
      processing the reduced features using the one or more LSTM layers to generate the second features.
  - 12. The computer program product of claim 10, wherein the operations further comprise:
    - generating, based on the input features, short-term features having a first number of contextual frames,wherein features generated using the one or more CNN layers include long-term features having a second number of contextual frames that are more than the first number of contextual frames of the short-term features; and
      
      providing the short-term features and the long-term features as input to the one or more LSTM layers.
  - 13. The computer program product of claim 10, wherein the one or more CNN layers, the one or more LSTM layers, and the one or more fully connected neural network layers have been jointly trained to determine trained values of parameters of the one or more CNN layers, the one or more LSTM layers, and the one or more fully connected neural network layers.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Sainath, Tara N., Senior, Andrew W., Vinyals, Oriol, Sak, Hasim

Granted Patent

US 11,715,486 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06N 3/044   Recurrent networks, e.g. Ho...

G06N 3/045   Combinations of networks

G10L 15/02   Feature extraction for spee...

G10L 15/16   using artificial neural net...

G10L 25/30   using neural networks

CONVOLUTIONAL, LONG SHORT-TERM MEMORY, FULLY CONNECTED DEEP NEURAL NETWORKS

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

4 Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

CONVOLUTIONAL, LONG SHORT-TERM MEMORY, FULLY CONNECTED DEEP NEURAL NETWORKS

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

4 Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links