Discriminative language modeling for automatic speech recognition with a weak acoustic model and distributed training
First Claim
Patent Images
1. A method comprising:
- determining, by a computing system, a reference transcription of a reference utterance, wherein the reference transcription is derived using a strong acoustic model, a language model and a weight vector, and wherein the reference transcription has a confidence level of at least 70%;
based on the reference transcription having the confidence level of at least 70%, determining a secondary transcription of the reference utterance, wherein the secondary transcription is derived using a weak acoustic model, the language model and the weight vector, wherein the secondary transcription has a secondary confidence level, wherein the weak acoustic model has a higher error rate than the strong acoustic model, and wherein the secondary transcription is different from the reference transcription; and
based on the secondary transcription being different from the reference transcription, updating the weight vector so that transcribing the reference utterance using the weak acoustic model, the language model and the updated weight vector results in a tertiary transcription with a tertiary confidence level that is greater than the secondary confidence level.
3 Assignments
0 Petitions
Accused Products
Abstract
Training data from a plurality of utterance-to-text-string mappings of an automatic speech recognition (ASR) system may be selected. Parameters of the ASR system that characterize the utterances and their respective mappings may be determined through application of a first acoustic model and a language model. A second acoustic model and the language model may be applied to the selected training data utterances to determine a second set of utterance-to-text-string mappings. The first set of utterance-to-text-string mappings may be compared to the second set of utterance-to-text-string mappings, and the parameters of the ASR system may be updated based on the comparison.
81 Citations
23 Claims
-
1. A method comprising:
-
determining, by a computing system, a reference transcription of a reference utterance, wherein the reference transcription is derived using a strong acoustic model, a language model and a weight vector, and wherein the reference transcription has a confidence level of at least 70%; based on the reference transcription having the confidence level of at least 70%, determining a secondary transcription of the reference utterance, wherein the secondary transcription is derived using a weak acoustic model, the language model and the weight vector, wherein the secondary transcription has a secondary confidence level, wherein the weak acoustic model has a higher error rate than the strong acoustic model, and wherein the secondary transcription is different from the reference transcription; and based on the secondary transcription being different from the reference transcription, updating the weight vector so that transcribing the reference utterance using the weak acoustic model, the language model and the updated weight vector results in a tertiary transcription with a tertiary confidence level that is greater than the secondary confidence level. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. An article of manufacture including a non-transitory computer-readable storage medium, having stored thereon program instructions that, upon execution by a computing device, cause the computing device to perform operations comprising:
-
determining a reference transcription of a reference utterance, wherein the reference transcription is derived using a strong acoustic model, a language model and a weight vector, and wherein the reference transcription has a confidence level of at least 70%; based on the reference transcription having the confidence level of at least 70%, determining a secondary transcription of the reference utterance, wherein the secondary transcription is derived using a weak acoustic model, the language model and the weight vector, wherein the secondary transcription has a secondary confidence level, wherein the weak acoustic model has a higher error rate than the strong acoustic model, and wherein the secondary transcription is different from the reference transcription; and based on the secondary transcription being different from the reference transcription, updating the weight vector so that transcribing the reference utterance using the weak acoustic model, the language model and the updated weight vector results in a tertiary transcription with a tertiary confidence level that is greater than the secondary confidence level. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A computing system comprising:
-
a plurality of storage computing devices, each configured to store a respective set of reference transcriptions of reference utterances, and a respective set of feature vectors, and to have access to a weight vector, wherein the respective sets of reference transcriptions were derived using a strong acoustic model, a language model and the weight vector, and wherein each feature vector is pairwise associated with a reference utterance stored on the same storage computing device; a plurality of training computing devices each configured to select a respective partition of the reference utterances, wherein each reference utterance in the respective partition is associated with a respective confidence level of at least 70%, wherein each training computing device is configured to, based on the reference transcription having the confidence level of at least 70%, apply a weak acoustic model, the language model, and the weight vector to the of reference utterances in the respective partition to determine a set of respective secondary transcriptions, wherein the respective secondary transcriptions have respective secondary confidence levels, wherein the weak acoustic model has a higher error rate than the strong acoustic model, and wherein at least some respective secondary transcriptions are different from the respective reference transcriptions; and at least one combining computing device configured to, based on the feature vectors associated with the reference utterances of the selected partitions, update the weight vector so that transcribing the respective reference utterances using the weak acoustic model, the language model and the updated weight vector results in respective tertiary transcriptions with respective tertiary confidence levels that are greater than the respective secondary confidence levels. - View Dependent Claims (18, 19, 20, 21, 22, 23)
-
Specification