Deep belief network for large vocabulary continuous speech recognition

US 8,972,253 B2
Filed: 09/15/2010
Issued: 03/03/2015
Est. Priority Date: 09/15/2010
Status: Active Grant

First Claim

Patent Images

1. A method executed by a processor, the method comprising:

receiving a sample at a context-dependent combination of a Deep Belief Network (DBN) and a Hidden Markov Model (HMM), wherein the sample is a spoken utteranceoutputting, at the DBN, a posterior probability distribution over labeled senones;

outputting, at the HMM, transition probabilities between the labeled senones, the transition probabilities based upon the posterior probability distribution over the labeled senones; and

decoding the sample based at least in part upon the posterior probability distribution over the labeled senones and the transition probabilities between the labeled senones.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method is disclosed herein that includes an act of causing a processor to receive a sample, wherein the sample is one of spoken utterance, an online handwriting sample, or a moving image sample. The method also comprises the act of causing the processor to decode the sample based at least in part upon an output of a combination of a deep structure and a context-dependent Hidden Markov Model (HMM), wherein the deep structure is configured to output a posterior probability of a context-dependent unit. The deep structure is a Deep Belief Network consisting of many layers of nonlinear units with connecting weights between layers trained by a pretraining step followed by a fine-tuning step.

Citations

20 Claims

1. A method executed by a processor, the method comprising:
- receiving a sample at a context-dependent combination of a Deep Belief Network (DBN) and a Hidden Markov Model (HMM), wherein the sample is a spoken utteranceoutputting, at the DBN, a posterior probability distribution over labeled senones;
  
  outputting, at the HMM, transition probabilities between the labeled senones, the transition probabilities based upon the posterior probability distribution over the labeled senones; and
  
  decoding the sample based at least in part upon the posterior probability distribution over the labeled senones and the transition probabilities between the labeled senones.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1, wherein the DBN is a probabilistic generative model that comprises multiple layers of stochastic hidden units above a single bottom layer of observed variables that represent a data vector.
  - 3. The method of claim 2, wherein the DBN is a feed-forward Artificial Neural Network (ANN).
  - 4. The method of claim 1, further comprising, during a training phase for the combination of the DBN and the HMM, deriving the combination of the DBN and the HMM from a Gaussian Mixture Model (GMM)-HMM system.
  - 5. The method of claim 1 configured to execute in a mobile computing apparatus.
  - 6. The method of claim 1, further comprising:
    - during a training phase for the context-dependent combination of the DBN and the HMM, performing pretraining with respect to the DBN, the DBN comprises a plurality of hidden stochastic layers, and wherein pretraining comprises utilizing an unsupervised algorithm to initialize weights of connections between the hidden stochastic layers.
  - 7. The method of claim 6, further comprising utilizing back-propagation to further refine the weights of the connections between the hidden stochastic layers.
  - 8. The method of claim 6, wherein a Restricted Boltzmann Machine is used in connection with the pretraining.
  - 9. The method of claim 1, further comprising:
    - during a training phase, aligning output units of the DBN with senones in the HMM, such that the output units are assigned the senones in the HMM.
  - 10. The method of claim 1, further comprising:
    - decoding the sample based part upon prior probabilities assigned to the labeled senones.
  - 11. The method of claim 1, further comprising:
    - receiving a Gaussian Mixture Model (GMM)-Hidden Markov Model (HMM) system that is trained to undertake automatic speech recognition; and
      
      converting the GMM-HMM to the DBN-HMM.
  - 12. The method of claim 11, further comprising:
    - utilizing an unsupervised training algorithm to initialize weights of the connections in the DBN; and
      
      utilizing back-propagation to refine the weights of the connections in the DBN.

13. A computer-implemented speech recognition system comprising:
- a processor; and
  
  a plurality of components that are executable by the processor, the plurality of components comprising;
  
  a computer-executable combination of a Deep Belief Network (DBN) and a Hidden Markov Model (HMM) that is configured to receive an input sample, wherein the input sample is based upon a spoken utterance, wherein the DBN is configured to output a posterior probability distribution over labeled senones, and wherein the HMM is configured to output transition probabilities between states, the states corresponding to the labeled senones; and
  
  a decoder component that is configured to decode a word sequence from the input sample based at least in part upon the posterior probability distribution over the labeled senones and the transition probabilities between the states.
- View Dependent Claims (14, 15, 16, 17, 18, 19)
- - 14. The system of claim 13, wherein the DBN is a probabilistic generative model that comprises multiple layers of stochastic hidden units above a single bottom layer of observed variables that represent a data vector.
  - 15. The system of claim 14, wherein the components further comprise a converter/training component that is configured to generate the combination of the DBN and the HMM based at least in part upon a Gaussian Mixture Model (GMM)-HMM system.
  - 16. The system of claim 13 comprised by a portable computing apparatus.
  - 17. The system of claim 16, wherein the portable computing apparatus is a mobile telephone.
  - 18. The system of claim 13, wherein the DBN is pretrained with unlabeled data and refined through back-propagation.
  - 19. The system of claim 13, wherein the decoder component that is configured to decode the word sequence from the input sample based part upon prior probabilities assigned to the labeled senones.

20. A computer-readable memory comprising instructions that, when executed by a processor, cause the processor to perform acts comprising:
- receiving a Gaussian Mixture Model (GMM)-Hidden Markov Model (HMM) system that is trained to undertake automatic speech recognition;
  
  converting the GMM-HMM to a Deep Belief Network (DBN)-HMM system, wherein the DBN comprises a plurality of layers of stochastic hidden units above a bottom layer of observed variables that represent a data vector, wherein the DBN comprises a plurality of undirected weighted connections between an uppermost two layers and directed weighted connections at other layers, wherein the DBN is configured to output posterior probabilities of senones pertaining to spoken utterances and the HMM is configured to output transition probabilities between the senones;
  
  utilizing an unsupervised training algorithm to initialize weights of the connections in the DBN;
  
  utilizing back-propagation to refine the weights of the connections in the DBN; and
  
  deploying the DBN-HMM in an automatic speech recognition system.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Deng, Li, Yu, Dong, Dahl, George Edward
Primary Examiner(s)
Neway, Samuel G

Application Number

US12/882,233
Publication Number

US 20120065976A1
Time in Patent Office

1,630 Days
Field of Search

704231-257
US Class Current

704/231
CPC Class Codes

G06N 3/044   Recurrent networks, e.g. Ho...

G06N 3/045   Combinations of networks

G06N 3/047   Probabilistic or stochastic...

G06N 3/084   Backpropagation, e.g. using...

G10L 15/14   using statistical models, e...

Deep belief network for large vocabulary continuous speech recognition

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Deep belief network for large vocabulary continuous speech recognition

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links