Discriminative language modeling for automatic speech recognition with a weak acoustic model and distributed training

US 8,965,763 B1
Filed: 05/01/2012
Issued: 02/24/2015
Est. Priority Date: 02/02/2012
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

determining, by a computing system, a reference transcription of a reference utterance, wherein the reference transcription is derived using a strong acoustic model, a language model and a weight vector, and wherein the reference transcription has a confidence level of at least 70%;

based on the reference transcription having the confidence level of at least 70%, determining a secondary transcription of the reference utterance, wherein the secondary transcription is derived using a weak acoustic model, the language model and the weight vector, wherein the secondary transcription has a secondary confidence level, wherein the weak acoustic model has a higher error rate than the strong acoustic model, and wherein the secondary transcription is different from the reference transcription; and

based on the secondary transcription being different from the reference transcription, updating the weight vector so that transcribing the reference utterance using the weak acoustic model, the language model and the updated weight vector results in a tertiary transcription with a tertiary confidence level that is greater than the secondary confidence level.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Training data from a plurality of utterance-to-text-string mappings of an automatic speech recognition (ASR) system may be selected. Parameters of the ASR system that characterize the utterances and their respective mappings may be determined through application of a first acoustic model and a language model. A second acoustic model and the language model may be applied to the selected training data utterances to determine a second set of utterance-to-text-string mappings. The first set of utterance-to-text-string mappings may be compared to the second set of utterance-to-text-string mappings, and the parameters of the ASR system may be updated based on the comparison.

81 Citations

View as Search Results

23 Claims

1. A method comprising:
- determining, by a computing system, a reference transcription of a reference utterance, wherein the reference transcription is derived using a strong acoustic model, a language model and a weight vector, and wherein the reference transcription has a confidence level of at least 70%;
  
  based on the reference transcription having the confidence level of at least 70%, determining a secondary transcription of the reference utterance, wherein the secondary transcription is derived using a weak acoustic model, the language model and the weight vector, wherein the secondary transcription has a secondary confidence level, wherein the weak acoustic model has a higher error rate than the strong acoustic model, and wherein the secondary transcription is different from the reference transcription; and
  
  based on the secondary transcription being different from the reference transcription, updating the weight vector so that transcribing the reference utterance using the weak acoustic model, the language model and the updated weight vector results in a tertiary transcription with a tertiary confidence level that is greater than the secondary confidence level.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein the secondary confidence level is less than the confidence level.
  - 3. The method of claim 1, wherein the strong acoustic model provides a correct phonemic interpretation of the reference utterance, and the weak acoustic model provides an incorrect phonemic interpretation of the reference utterance.
  - 4. The method of claim 1, further comprising:
    - receiving an input utterance from a client device;
      
      determining an output transcription of the input utterance, wherein the output transcription is derived using the strong acoustic model, the language model, and the updated weight vector; and
      
      transmitting the output transcription to the client device.
  - 5. The method of claim 1, wherein the reference transcription and the reference utterance are associated with a feature vector that defines at least one characteristic related to the reference transcription and the reference utterance, and wherein determining a secondary transcription of the reference utterance comprises determining an inner product of the weight vector and the feature vector.
  - 6. The method of claim 1, wherein a first feature vector characterizes the reference transcription, wherein a second feature vector characterizes the secondary transcription, and wherein updating the weight vector comprises adding the first feature vector to the weight vector and subtracting the second feature vector from the weight vector.
  - 7. The method of claim 6, wherein updating the weight vector comprises calculating a moving average over the weight vector and the first and second feature vectors.
  - 8. The method of claim 1, wherein the reference transcription and the reference utterance are stored on a storage computing device of the computing system, wherein determining the secondary transcription of the reference utterance is performed by a training computing device of the computing system, and wherein updating the weight vector is performed by a combining computing device of the computing system.

9. An article of manufacture including a non-transitory computer-readable storage medium, having stored thereon program instructions that, upon execution by a computing device, cause the computing device to perform operations comprising:
- determining a reference transcription of a reference utterance, wherein the reference transcription is derived using a strong acoustic model, a language model and a weight vector, and wherein the reference transcription has a confidence level of at least 70%;
  
  based on the reference transcription having the confidence level of at least 70%, determining a secondary transcription of the reference utterance, wherein the secondary transcription is derived using a weak acoustic model, the language model and the weight vector, wherein the secondary transcription has a secondary confidence level, wherein the weak acoustic model has a higher error rate than the strong acoustic model, and wherein the secondary transcription is different from the reference transcription; and
  
  based on the secondary transcription being different from the reference transcription, updating the weight vector so that transcribing the reference utterance using the weak acoustic model, the language model and the updated weight vector results in a tertiary transcription with a tertiary confidence level that is greater than the secondary confidence level.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The article of manufacture of claim 9, wherein the secondary confidence level is less than the confidence level.
  - 11. The article of manufacture of claim 9, wherein the strong acoustic model provides a correct phonemic interpretation of the reference utterance, and the weak acoustic model provides an incorrect phonemic interpretation of the reference utterance.
  - 12. The article of manufacture of claim 9, wherein the operations further comprise:
    - receiving an input utterance from a client device;
      
      determining an output transcription of the input utterance, wherein the output transcription is derived using the strong acoustic model, the language model, and the updated weight vector; and
      
      transmitting the output transcription to the client device.
  - 13. The article of manufacture of claim 9, wherein the reference transcription and the reference utterance are associated with a feature vector that defines at least one characteristic related to the reference transcription and the reference utterance, and wherein determining a secondary transcription of the reference utterance comprises determining an inner product of the weight vector and the feature vector.
  - 14. The article of manufacture of claim 9, wherein a first feature vector characterizes the reference transcription, wherein a second feature vector characterizes the secondary transcription, and wherein updating the weight vector comprises adding the first feature vector to the weight vector and subtracting the second feature vector from the weight vector.
  - 15. The article of manufacture of claim 14, wherein updating the weight vector comprises calculating a moving average over the weight vector and the first and second feature vectors.
  - 16. The article of manufacture of claim 9, wherein the reference transcription and the reference utterance are stored on a storage computing device of a computing system, wherein determining the secondary transcription of the reference utterance is performed by a training computing device of the computing system, and wherein updating the weight vector is performed by a combining computing device of the computing system.

17. A computing system comprising:
- a plurality of storage computing devices, each configured to store a respective set of reference transcriptions of reference utterances, and a respective set of feature vectors, and to have access to a weight vector, wherein the respective sets of reference transcriptions were derived using a strong acoustic model, a language model and the weight vector, and wherein each feature vector is pairwise associated with a reference utterance stored on the same storage computing device;
  
  a plurality of training computing devices each configured to select a respective partition of the reference utterances, wherein each reference utterance in the respective partition is associated with a respective confidence level of at least 70%, wherein each training computing device is configured to, based on the reference transcription having the confidence level of at least 70%, apply a weak acoustic model, the language model, and the weight vector to the of reference utterances in the respective partition to determine a set of respective secondary transcriptions, wherein the respective secondary transcriptions have respective secondary confidence levels, wherein the weak acoustic model has a higher error rate than the strong acoustic model, and wherein at least some respective secondary transcriptions are different from the respective reference transcriptions; and
  
  at least one combining computing device configured to, based on the feature vectors associated with the reference utterances of the selected partitions, update the weight vector so that transcribing the respective reference utterances using the weak acoustic model, the language model and the updated weight vector results in respective tertiary transcriptions with respective tertiary confidence levels that are greater than the respective secondary confidence levels.
- View Dependent Claims (18, 19, 20, 21, 22, 23)
- - 18. The computing system of claim 17, wherein the respective secondary confidence levels are less than the respective confidence levels.
  - 19. The computing system of claim 17, wherein each of the plurality of training computing devices applying the weak acoustic model, the language model, and the weight vector to the respective selected partitions of reference utterances produces respective weight vector adjustments.
  - 20. The computing system of claim 19, wherein the at least one combining computing device is configured to sum the respective weight vector adjustments to update the weight vector.
  - 21. The computing system of claim 19, wherein the at least one combining computing device is configured to average the respective weight vector adjustments to update the weight vector.
  - 22. The computing system of claim 19, wherein the at least one combining computing device is configured to calculate a moving average over the weight vector and the respective weight vector adjustments to update the weight vector.
  - 23. The computing system of claim 19, wherein applying the weak acoustic model, the language model, and the weight vector to the respective selected partitions of reference utterances comprises determining respective inner products of the weight vector and the respective feature vectors associated with the respective reference utterances.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Strope, Brian, Chelba, Ciprian Ioan, Jyothi, Preethi, Johnson, Leif
Primary Examiner(s)
Desir, Pierre-Louis
Assistant Examiner(s)
Kovacek, David

Application Number

US13/461,093
Time in Patent Office

1,029 Days
Field of Search

704231-257, 704258-269, 704E15001-E1505
US Class Current

704/244
CPC Class Codes

G10L 15/063   Training

G10L 15/18   using natural language mode...

G10L 15/32   Multiple recognisers used i...

Discriminative language modeling for automatic speech recognition with a weak acoustic model and distributed training

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

81 Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Discriminative language modeling for automatic speech recognition with a weak acoustic model and distributed training

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

81 Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links