CONVOLUTIONAL, LONG SHORT-TERM MEMORY, FULLY CONNECTED DEEP NEURAL NETWORKS
First Claim
Patent Images
1. A method comprising:
- receiving input features, wherein the input features include respective segment features for each of a plurality of segments; and
processing the input features using a model, wherein the processing comprises;
for each of the segments;
generating first features for the segment by processing the segment features for the segment using one or more convolutional neural network (CNN) layers, wherein the convolutional neural network (CNN) layers perform spatial modeling on the input feature;
generating second features for the segment by processing the first features using one or more long short-term memory network (LSTM) layers to perform temporal modeling over the first features; and
determining an output feature based on at least the second features for the plurality of segments.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying the language of a spoken utterance. One of the methods includes receiving input features of an utterance; and processing the input features using an acoustic model that comprises one or more convolutional neural network (CNN) layers, one or more long short-term memory network (LSTM) layers, and one or more fully connected neural network layers to generate a transcription for the utterance.
4 Citations
13 Claims
-
1. A method comprising:
-
receiving input features, wherein the input features include respective segment features for each of a plurality of segments; and processing the input features using a model, wherein the processing comprises; for each of the segments; generating first features for the segment by processing the segment features for the segment using one or more convolutional neural network (CNN) layers, wherein the convolutional neural network (CNN) layers perform spatial modeling on the input feature; generating second features for the segment by processing the first features using one or more long short-term memory network (LSTM) layers to perform temporal modeling over the first features; and determining an output feature based on at least the second features for the plurality of segments. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to perform operations comprising:
-
receiving input features, wherein the input features include respective segment features for each of a plurality of segments; and processing the input features using a model, wherein the processing comprises; for each of the segments; generating first features for the segment by processing the segment features for the segment using one or more convolutional neural network (CNN) layers, wherein the convolutional neural network (CNN) layers perform spatial modeling on the input feature; generating second features for the segment by processing the first features using one or more long short-term memory network (LSTM) layers to perform temporal modeling over the first features; and determining an output feature based on at least the second features for the plurality of segments. - View Dependent Claims (7, 8, 9)
-
-
10. A computer program product encoded on one or more non-transitory computer storage media, the computer program product comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
-
receiving input features, wherein the input features include respective segment features for each of a plurality of segments; and processing the input features using a model, wherein the processing comprises; for each of the segments; generating first features for the segment by processing the segment features for the segment using one or more convolutional neural network (CNN) layers, wherein the convolutional neural network (CNN) layers perform spatial modeling on the input feature; generating second features for the segment by processing the first features using one or more long short-term memory network (LSTM) layers to perform temporal modeling over the first features; and determining an output feature based on at least the second features for the plurality of segments. - View Dependent Claims (11, 12, 13)
-
Specification