Method and Apparatus for Speech Recognition Using Neural Networks with Speaker Adaptation
First Claim
Patent Images
1. A method for speech recognition, the comprising:
- receiving, by a deep neural network, input speech data and corresponding speaker data; and
generating, by the deep neural network, a prediction of a phoneme corresponding to the input speech data based on the corresponding speaker data.
2 Assignments
0 Petitions
Accused Products
Abstract
In a speech recognition system, deep neural networks (DNNs) are employed in phoneme recognition. While DNNs typically provide better phoneme recognition performance than other techniques, such as Gaussian mixture models (GMM), adapting a DNN to a particular speaker is a real challenge. According to at least one example embodiment, speech data and corresponding speaker data are both applied as input to a DNN. In response, the DNN generates a prediction of a phoneme based on the input speech data and the corresponding speaker data. The speaker data may be generated from the corresponding speech data.
-
Citations
20 Claims
-
1. A method for speech recognition, the comprising:
-
receiving, by a deep neural network, input speech data and corresponding speaker data; and generating, by the deep neural network, a prediction of a phoneme corresponding to the input speech data based on the corresponding speaker data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. An apparatus for speech recognition comprising:
-
at least one processor; and at least one memory with computer code instructions stored thereon, the at least one processor and the at least one memory with computer code instructions being configured to cause the apparatus to; receive, by an input layer of a deep neural network, input speech data and corresponding speaker data; and generate, at an output layer of the deep neural network, a prediction of a phoneme corresponding to the input speech data based on the corresponding speaker data. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A non-transitory computer-readable medium with computer code instructions stored thereon, the computer code instructions being configured, when executed by a processor, to cause an apparatus to:
-
receive, by a deep neural network, input speech data and corresponding speaker data; and generate, by the deep neural network, a prediction of a phoneme corresponding to the input speech data based on the corresponding speaker data.
-
Specification