CONTEXT-DEPENDENT STATE TYING USING A NEURAL NETWORK
First Claim
Patent Images
1. A computer-implemented method comprising:
- receiving an audio signal encoding a portion of an utterance;
providing, to a first neural network, data corresponding to the audio signal; and
generating, by a processor, data representing a transcription for the utterance based on an output of the first neural network,wherein the first neural network is trained using features of multiple context-dependent states, the context-dependent states being derived from a plurality of context-independent states provided by a second neural network.
2 Assignments
0 Petitions
Accused Products
Abstract
The technology described herein can be embodied in a method that includes receiving an audio signal encoding a portion of an utterance, and providing, to a first neural network, data corresponding to the audio signal. The method also includes generating, by a processor, data representing a transcription for the utterance based on an output of the first neural network. The first neural network is trained using features of multiple context-dependent states, the context-dependent states being derived from a plurality of context-independent states provided by a second neural network.
63 Citations
20 Claims
-
1. A computer-implemented method comprising:
-
receiving an audio signal encoding a portion of an utterance; providing, to a first neural network, data corresponding to the audio signal; and generating, by a processor, data representing a transcription for the utterance based on an output of the first neural network, wherein the first neural network is trained using features of multiple context-dependent states, the context-dependent states being derived from a plurality of context-independent states provided by a second neural network. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising
a speech recognition engine comprising a processor, the speech recognition engine configured to: -
receive an audio signal encoding a portion of an utterance; and generate data representing a transcription for the utterance based on an output of a first neural network, which accepts as input, data corresponding to the audio signal, wherein the first neural network is trained using features of multiple context-dependent states, the context-dependent states being derived from a plurality of context-independent states provided by a second neural network. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer readable storage device encoding one or more computer readable instructions, which upon execution by one or more processors cause operations comprising:
-
receiving an audio signal encoding a portion of an utterance; providing, to a first neural network, data corresponding to the audio signal; and generating data representing a transcription for the utterance based on an output of the first neural network, wherein the first neural network is trained using features of multiple context-dependent states, the context-dependent states being derived from a plurality of context-independent states provided by a second neural network. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification