FULL-SEQUENCE TRAINING OF DEEP STRUCTURES FOR SPEECH RECOGNITION
First Claim
Patent Images
1. A method comprising the following computer-executable acts:
- causing a processor to access a deep-structured model retained in a computer-readable medium, wherein the deep-structured model comprises a plurality of layers with weights assigned thereto, transition probabilities between states, and language model scores; and
jointly optimizing the weights, the transition probabilities, and the language model scores of the deep-structured model.
3 Assignments
0 Petitions
Accused Products
Abstract
A method is disclosed herein that include an act of causing a processor to access a deep-structured model retained in a computer-readable medium, wherein the deep-structured model comprises a plurality of layers with weights assigned thereto, transition probabilities between states, and language model scores. The method can further include the act of jointly substantially optimizing the weights, the transition probabilities, and the language model scores of the deep-structured model using the optimization criterion based on a sequence rather than a set of unrelated frames.
-
Citations
20 Claims
-
1. A method comprising the following computer-executable acts:
-
causing a processor to access a deep-structured model retained in a computer-readable medium, wherein the deep-structured model comprises a plurality of layers with weights assigned thereto, transition probabilities between states, and language model scores; and jointly optimizing the weights, the transition probabilities, and the language model scores of the deep-structured model. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-implemented system comprising:
-
a processor; and a memory that comprises a plurality of components that are executable by the processor, the components comprising; a receiver component that receives a pretrained deep-structured model, wherein the deep-structured model comprises a plurality of layers, weights between the layers, transition parameters, and language model scores; and a trainer component that jointly substantially optimize weights of the pretrained deep-structured model, state transition parameters of the pretrained deep-structured model, and language model scores of the pretrained deep-structured model. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A non-transitory computer-readable medium comprising instructions that, when executed by a processor, cause the processor to perform acts comprising:
-
greedily learning each layer of a deep belief network (DBN) that is configured for employment in an automatic speech recognition (ASR) system, wherein the DBN is temporally parameter-tied; providing training data to the DBN to optimize the log of the conditional probabilities of output states of the DBN; and jointly optimizing weights in the DBN, transition probabilities in a top layer of the DBN, and language model scores in the DBN based at least in part upon the log of the conditional probabilities of output sequences produced by the DBN.
-
Specification