Full-sequence training of deep structures for speech recognition

US 9,031,844 B2
Filed: 09/21/2010
Issued: 05/12/2015
Est. Priority Date: 09/21/2010
Status: Active Grant

First Claim

Patent Images

1. A method comprising the following computer-executable acts:

accessing a deep belief network (DBN) retained in computer-readable data storage, wherein the DBN comprises;

a plurality of stacked hidden layers, each hidden layer comprises a respective plurality of stochastic units, each stochastic unit in each layer connected to stochastic units in an adjacent hidden layer of the DBN by way of connections, the connections assigned weights learned during a pretraining procedure; and

a linear-chain conditional random field (CRF), the CRF comprises;

a hidden layer that comprises a plurality of stochastic units; and

a plurality of output units that are representative of output states, each state in the output states being one of a phone or senone, the plurality of stochastic units connected to the plurality of output units by way of second connections, the second connections having weights learned during the pretraining procedure, the output units have transition probabilities corresponding thereto that are indicative of probabilities of transitioning between output states represented by the output units; and

jointly optimizing the weights assigned to the connections, the weights assigned to the second connections, the transition probabilities, and language model scores of the DBN based upon training data, wherein a processor performs the jointly optimizing of the weights.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method includes an act of causing a processor to access a deep-structured model retained in a computer-readable medium, the deep-structured model includes a plurality of layers with respective weights assigned to the plurality of layers, transition probabilities between states, and language model scores. The method further includes the act of jointly substantially optimizing the weights, the transition probabilities, and the language model scores of the deep-structured model using the optimization criterion based on a sequence rather than a set of unrelated frames.

Citations

20 Claims

1. A method comprising the following computer-executable acts:
- accessing a deep belief network (DBN) retained in computer-readable data storage, wherein the DBN comprises;
  
  a plurality of stacked hidden layers, each hidden layer comprises a respective plurality of stochastic units, each stochastic unit in each layer connected to stochastic units in an adjacent hidden layer of the DBN by way of connections, the connections assigned weights learned during a pretraining procedure; and
  
  a linear-chain conditional random field (CRF), the CRF comprises;
  
  a hidden layer that comprises a plurality of stochastic units; and
  
  a plurality of output units that are representative of output states, each state in the output states being one of a phone or senone, the plurality of stochastic units connected to the plurality of output units by way of second connections, the second connections having weights learned during the pretraining procedure, the output units have transition probabilities corresponding thereto that are indicative of probabilities of transitioning between output states represented by the output units; and
  
  jointly optimizing the weights assigned to the connections, the weights assigned to the second connections, the transition probabilities, and language model scores of the DBN based upon training data, wherein a processor performs the jointly optimizing of the weights.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein each state in the output states is a senone.
  - 3. The method of claim 1, wherein the training data comprises speech samples, gesture samples, or handwriting samples.
  - 4. The method of claim 1 configured to execute in a mobile computing apparatus.
  - 5. The method of claim 1, further comprising pretraining the DBN, wherein pretraining the DBN comprises utilizing an unsupervised algorithm to assign the weights to the connections and the weights to the second connections.
  - 6. The method of claim 5, further comprising utilizing back-propagation to jointly optimize the weights to the connections, the weights to the second connections, the transition probabilities, and the language model scores.
  - 7. The method of claim 5, wherein pretraining the DBN comprises learning weights to assign to stochastic units in each hidden layer greedily by treating each pair of adjacent layers in the DBN as a Restricted Boltzmann Machine.

8. A computer-implemented system comprising:
- a processor; and
  
  a memory that comprises a plurality of components that are executable by the processor, the components comprising;
  
  a receiver component that receives a pretrained deep belief network (DBN), wherein the DBN comprises a plurality of hidden layers, weights between the hidden layers, a linear conditional random field (CRF) that comprises output units that each represent possible output states, transition probabilities between output units, and language model scores, the transition probabilities representative of probabilities of transitioning between output states represented by the output units, each output state being one of a phone or senone; and
  
  a trainer component that jointly optimizes weights of the pretrained DBN, the transition probabilities of the pretrained DBN, and language model scores of the pretrained DBN based upon a set of training data.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15)
- - 9. The system of claim 8, wherein the pretrained DBN is trained for speech recognition.
  - 10. The system of claim 8, wherein each output unit is representative of a senone.
  - 11. The system of claim 8, wherein the components further comprise an initializer component that performs a pretraining procedure, wherein the initializer component assigns the weights the weights between the hidden layers to generate the pretrained DBN.
  - 12. The system of claim 11, wherein the initializer component greedily learns the weights between the hidden layers of the DBN to generate the pretrained DBN.
  - 13. The system of claim 8, wherein the trainer component determines a conditional probability of a full-sequence of output states of the DBN in connection with jointly optimizing the weights, transition probabilities, and language model scores.
  - 14. The system of claim 8, wherein the trainer component utilizes back-propagation to jointly optimize the weights between the hidden layers, the transition probabilities, and the language model scores.
  - 15. The system of claim 8, wherein the trainer component receives a DBN and learns initial weights for each layer in the DBN to form the pretrained DBN, wherein the trainer component learns the initial weights by greedily treating each adjacent pair of layers in the DBN as a Restricted Boltzmann machine.

16. A computing device comprising a computer-readable medium, the computer-readable medium comprising instructions that, when executed by a processor, cause the processor to perform acts comprising:
- greedily learning weights between hidden layers of a deep belief network (DBN) that is configured for employment in an automatic speech recognition (ASR) system, wherein the DBN is temporally parameter-tied and an uppermost layer in the DBN is a linear-chain conditional random field (CRF), the linear chain CRF comprises a plurality of output units that are representative of respective output states, each output state being one of a phone or senone;
  
  providing training data to the DBN to optimize a log of conditional probabilities of output sequences of the DBN, an output sequence comprising a sequence of output states represented by the output units; and
  
  jointly optimizing the weights between the DBN, transition probabilities between the output units in the CRF, and language model scores in the DBN based upon the log of the conditional probabilities of output sequences produced by the DBN.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The computing device of claim 16, wherein greedily learning the weights between the hidden layers comprises utilizing an unsupervised algorithm to initialize the weights between the hidden layers of the DBN.
  - 18. The computing device of claim 16, wherein greedily learning the weights between the hidden layers comprises treating pairs of adjacent hidden layers in the DBN as a Restricted Boltzmann Machine.
  - 19. The computing device of claim 16, wherein the DBN is configured to output a probability distribution over the output units, the output states represented by the output units being senones.
  - 20. The computing device of claim 16, wherein back-propagation is used to jointly optimize the weights, the transition probabilities, and the language model scores of the DBN.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Yu, Dong, Deng, Li, Mohamed, Abdel-rahman Samir Abdel-rahman
Primary Examiner(s)
Godbold, Douglas

Application Number

US12/886,568
Publication Number

US 20120072215A1
Time in Patent Office

1,694 Days
Field of Search

704231-257, 706 1- 62
US Class Current

704/256
CPC Class Codes

G06F 18/29   Graphical models, e.g. Baye...

G06N 3/045   Combinations of networks

G06N 3/084   Backpropagation, e.g. using...

G10L 15/14   using statistical models, e...

Full-sequence training of deep structures for speech recognition

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Full-sequence training of deep structures for speech recognition

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links