Deep belief network for large vocabulary continuous speech recognition
First Claim
1. A method executed by a processor, the method comprising:
- receiving a sample at a context-dependent combination of a Deep Belief Network (DBN) and a Hidden Markov Model (HMM), wherein the sample is a spoken utteranceoutputting, at the DBN, a posterior probability distribution over labeled senones;
outputting, at the HMM, transition probabilities between the labeled senones, the transition probabilities based upon the posterior probability distribution over the labeled senones; and
decoding the sample based at least in part upon the posterior probability distribution over the labeled senones and the transition probabilities between the labeled senones.
2 Assignments
0 Petitions
Accused Products
Abstract
A method is disclosed herein that includes an act of causing a processor to receive a sample, wherein the sample is one of spoken utterance, an online handwriting sample, or a moving image sample. The method also comprises the act of causing the processor to decode the sample based at least in part upon an output of a combination of a deep structure and a context-dependent Hidden Markov Model (HMM), wherein the deep structure is configured to output a posterior probability of a context-dependent unit. The deep structure is a Deep Belief Network consisting of many layers of nonlinear units with connecting weights between layers trained by a pretraining step followed by a fine-tuning step.
-
Citations
20 Claims
-
1. A method executed by a processor, the method comprising:
-
receiving a sample at a context-dependent combination of a Deep Belief Network (DBN) and a Hidden Markov Model (HMM), wherein the sample is a spoken utterance outputting, at the DBN, a posterior probability distribution over labeled senones; outputting, at the HMM, transition probabilities between the labeled senones, the transition probabilities based upon the posterior probability distribution over the labeled senones; and decoding the sample based at least in part upon the posterior probability distribution over the labeled senones and the transition probabilities between the labeled senones. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A computer-implemented speech recognition system comprising:
-
a processor; and a plurality of components that are executable by the processor, the plurality of components comprising; a computer-executable combination of a Deep Belief Network (DBN) and a Hidden Markov Model (HMM) that is configured to receive an input sample, wherein the input sample is based upon a spoken utterance, wherein the DBN is configured to output a posterior probability distribution over labeled senones, and wherein the HMM is configured to output transition probabilities between states, the states corresponding to the labeled senones; and a decoder component that is configured to decode a word sequence from the input sample based at least in part upon the posterior probability distribution over the labeled senones and the transition probabilities between the states. - View Dependent Claims (14, 15, 16, 17, 18, 19)
-
-
20. A computer-readable memory comprising instructions that, when executed by a processor, cause the processor to perform acts comprising:
-
receiving a Gaussian Mixture Model (GMM)-Hidden Markov Model (HMM) system that is trained to undertake automatic speech recognition; converting the GMM-HMM to a Deep Belief Network (DBN)-HMM system, wherein the DBN comprises a plurality of layers of stochastic hidden units above a bottom layer of observed variables that represent a data vector, wherein the DBN comprises a plurality of undirected weighted connections between an uppermost two layers and directed weighted connections at other layers, wherein the DBN is configured to output posterior probabilities of senones pertaining to spoken utterances and the HMM is configured to output transition probabilities between the senones; utilizing an unsupervised training algorithm to initialize weights of the connections in the DBN; utilizing back-propagation to refine the weights of the connections in the DBN; and deploying the DBN-HMM in an automatic speech recognition system.
-
Specification