Transparent monitoring and intervention to improve automatic adaptation of speech models

US 7,660,715 B1
Filed: 01/12/2004
Issued: 02/09/2010
Est. Priority Date: 01/12/2004
Status: Active Grant

First Claim

Patent Images

1. A method to retrain an automatic speech recognition system, in which automatic speech recognition system a plurality of speech models is stored, the method comprising:

(a) extracting, by the automatic speech recognition system, a first user utterance from a sampled first input voice stream received from a user in response to a query;

(b) selecting, by the automatic speech recognition system and based on the first user utterance, a first speech model from among the plurality of speech models, the first speech model producing a first tentative recognition result corresponding to the first user utterance;

(c) informing the user of the first tentative recognition result;

(d) determining, by the automatic speech recognition system, from the user'"'"'s response whether the first tentative recognition result was correct;

(e) performing the following steps when the first tentative recognition result is not correct;

(i) requesting the user to repeat the response to the query;

(ii) extracting, by the automatic speech recognition system, a second user utterance from a sampled second input voice stream received from the user in response to the requesting step;

(iii) selecting, by the automatic speech recognition system and based on the second user utterance, a second speech model, different than the first speech model, the second speech model producing a second tentative recognition result corresponding to the second user utterance; and

(iv) determining, by a human operator, when the second speech model correctly corresponds to at least one of the first and second user utterances,wherein the first and second speech models are selected from a plurality of speech models.

View all claims

27 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method to improve the automatic adaptation of one or more speech models in automatic speech recognition systems. After a dialog begins, for example, the dialog asks the customer to provide spoken input and it is recorded. If the speech recognizer determines it may not have correctly transcribed the verbal response, i.e., voice input, the invention uses monitoring and if necessary, intervention to guarantee that the next transcription of the verbal response is correct. The dialog asks the customer to repeat his verbal response, which is recorded and a transcription of the input is sent to a human monitor, i.e., agent or operator. If the transcription of the spoken input is correct, the human does not intervene and the transcription remains unmodified. If the transcription of the verbal response is incorrect, the human intervenes and the transcription of the misrecognized word is corrected. In both cases, the dialog asks the customer to confirm the unmodified and corrected transcription. If the customer confirms the unmodified or newly corrected transcription, the dialog continues and the customer does not hang up in frustration because most times only one misrecognition occurred. Finally, the invention uses the first and second customer recording of the misrecognized word or utterance along with the corrected or unmodified transcription to automatically adapt one or more speech models, which improves the performance of the speech recognition system.

324 Citations

24 Claims

1. A method to retrain an automatic speech recognition system, in which automatic speech recognition system a plurality of speech models is stored, the method comprising:
- (a) extracting, by the automatic speech recognition system, a first user utterance from a sampled first input voice stream received from a user in response to a query;
  
  (b) selecting, by the automatic speech recognition system and based on the first user utterance, a first speech model from among the plurality of speech models, the first speech model producing a first tentative recognition result corresponding to the first user utterance;
  
  (c) informing the user of the first tentative recognition result;
  
  (d) determining, by the automatic speech recognition system, from the user'"'"'s response whether the first tentative recognition result was correct;
  
  (e) performing the following steps when the first tentative recognition result is not correct;
  
  (i) requesting the user to repeat the response to the query;
  
  (ii) extracting, by the automatic speech recognition system, a second user utterance from a sampled second input voice stream received from the user in response to the requesting step;
  
  (iii) selecting, by the automatic speech recognition system and based on the second user utterance, a second speech model, different than the first speech model, the second speech model producing a second tentative recognition result corresponding to the second user utterance; and
  
  (iv) determining, by a human operator, when the second speech model correctly corresponds to at least one of the first and second user utterances,wherein the first and second speech models are selected from a plurality of speech models.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein the plurality of speech models are developed from a large vocabulary stored in a speech recognition adaptation database;
    - and wherein step (e) further comprises;
      
      (v) retraining the first speech model using the second speech model.
  - 3. The method of claim 1, wherein the first tentative recognition result is at least one word.
  - 4. The method of claim 1, wherein, when the first tentative recognition result is correct, steps (i)-(v) are not performed.
  - 5. The method of claim 1, wherein the automatic speech recognition system provides a transcription of at least one of the first and second user utterances and wherein the informing step comprises:
    - converting, by a text-to-speech resource, the transcription into speech; and
      
      communicating, by an interactive voice response unit, the speech to the user.
  - 6. The method of claim 5, wherein the determining, by a human operator, step comprises:
    - displaying the transcription to the human operator;
      
      playing a recording of the at least one of the first and second user utterances to the human operator; and
      
      selecting, by the human operator, a third speech model as correctly corresponding to the recording, based on the transcription and recording.
  - 7. The method of claim 1, further comprising:
    - selecting, by the human operator, a third speech model that correctly corresponds to the second user utterance, when the second speech model does not correctly correspond to the second user utterance.
  - 8. The method of claim 7, further comprising:
    - retraining at least one speech model using said third speech model.
  - 9. A computer readable medium comprising processor executable instructions that, when executed, perform the steps of claim 1.

10. A method to retrain an automatic speech recognition system, in which automatic speech recognition system a plurality of speech models is stored, the method comprising:
- (a) extracting, by the automatic speech recognition system, a first user utterance from a first input voice stream from a user, the first user utterance being a response to a query;
  
  (b) selecting, by the automatic speech recognition system, a first speech model, the first speech model producing a first tentative recognition result based on the first user utterance;
  
  (c) determining, by the automatic speech recognition system, that the first tentative recognition result does not correctly characterize the first user utterance;
  
  (d) selecting, by a human operator and based on at least one of the first user utterance and a second user utterance received from the user, a second speech model as correctly characterizing the first user utterance, the second speech model producing a second tentative recognition result; and
  
  (e) retraining the first speech model using at least one of the first and second user utterances and the second tentative recognition result.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The method of claim 10, wherein, when the first tentative recognition result correctly characterizes the first user utterance, not performing the selecting step (d).
  - 12. The method of claim 10, wherein the plurality of speech models are developed from a large vocabulary stored in a speech recognition adaptation database and wherein the determining step comprises:
    - (C1) informing the user of the first tentative recognition result; and
      
      (C2) determining from the first user'"'"'s response whether the first tentative recognition result correctly characterizes the first user utterance.
  - 13. The method of claim 12, further comprising before the human operator selecting step (d):
    - (f) requesting the first user to repeat the response to the query;
      
      (g) extracting the second user utterance from a sampled second input voice stream received from the first user in response to the requesting step (e); and
      
      (h) selecting the second speech model producing the second tentative recognition result corresponding to the second user utterance.
  - 14. The method of claim 10, wherein the automatic speech recognition system generates a transcription of the first user utterance and wherein the determining step (c) comprises:
    - (C1) converting, by a text-to-speech resource, the transcription into speech; and
      
      (C2) communicating, by an interactive voice response unit, the speech to the user.
  - 15. The method of claim 14, wherein the selecting step (d) comprises:
    - (D1) displaying the transcription to the human operator;
      
      (D2) playing a recording of the first user utterance to the human operator; and
      
      (D3) selecting, by the human operator and based on the transcription and recording, a third speech model as correctly corresponding to the recording.
  - 16. The method of claim 14, wherein an adaptation agent is operable to provide an adaptation engine improved data to retrain at least one speech model, the improved data comprising said first user utterance and at least one of (i) a corrected transcription of said first user utterance when said human operator corrects said transcription;
    - (ii) an unmodified transcription of said first user utterance when said human operator does not correct said transcription.
  - 17. A computer readable medium comprising instructions that, when executed, perform the steps of claim 10.
  - 18. The method of claim 10, wherein the first tentative recognition result is at least one word.

19. A speech recognition system comprising:
- a speech recognition resource operable to extract a first user utterance from a first input voice stream from a user, the first user utterance being a response to a query;
  
  select a first speech model producing a first tentative recognition result characterizing the first user utterance; and
  
  determine that the first tentative recognition result does not correctly characterize the first user utterance;
  
  a model adaptation agent operable, when the first tentative recognition result does not correctly characterize the first user utterance, to alert a human operator, based on the first user utterance, to select a second speech model, different than the first speech model, to produce a second tentative recognition result correctly characterizing the first user utterance,wherein the first and second speech models are selected from a plurality of speech models.
- View Dependent Claims (20, 21, 22, 23, 24)
- - 20. The system of claim 19, further comprising:
    - an interactive voice response unit operable to inform the user of the first tentative recognition result and wherein the speech recognition resource is operable to determine from the first user'"'"'s response whether the first tentative recognition result correctly characterizes the first user utterance; and
      
      further comprising;
      
      an adaptation engine operable to retrain at least one speech model using at least the second tentative recognition result.
  - 21. The system of claim 19, further comprising:
    - an interactive voice response unit operable to request the first user to repeat the response to the query; and
      
      wherein the automatic speech recognition system is operable to extract a second user utterance from a sampled second input voice stream received from the first user in response to the request and select a third speech model to produce a third tentative recognition result corresponding to the second user utterance.
  - 22. The system of claim 19 wherein, when the first tentative recognition result correctly characterizes the first user utterance, the adaptation engine does not alert the human operator.
  - 23. The system of claim 19, wherein the speech recognition resource generates a transcription of at the first user utterance and further comprising:
    - a text-to-speech resource operable to convert the transcription into speech; and
      
      an interactive voice response unit operable to communicate the speech to the user.
  - 24. The system of claim 23, wherein the adaptation agent is operable to display the transcription to the human operator and play a recording of the first user utterance to the human operator and wherein the human operator, based on the transcription and recording, selects a third speech model as correctly corresponding to the recording.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Arlington Technologies, LLC (Dominion Harbor Enterprises, LLC)
Original Assignee
Avaya Incorporated
Inventors
Thambiratnam, David Preshan
Primary Examiner(s)
Opsasnick; Michael N

Application Number

US10/756,669
Time in Patent Office

2,220 Days
Field of Search

704244-247
US Class Current

704/244
CPC Class Codes

G10L 15/063   Training

G10L 15/065   Adaptation

G10L 15/22   Procedures used during a sp...

Transparent monitoring and intervention to improve automatic adaptation of speech models

First Claim

27 Assignments

0 Petitions

Accused Products

Abstract

324 Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

Transparent monitoring and intervention to improve automatic adaptation of speech models

First Claim

27 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

324 Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links