Transparent monitoring and intervention to improve automatic adaptation of speech models
First Claim
1. A method to retrain an automatic speech recognition system, in which automatic speech recognition system a plurality of speech models is stored, the method comprising:
- (a) extracting, by the automatic speech recognition system, a first user utterance from a sampled first input voice stream received from a user in response to a query;
(b) selecting, by the automatic speech recognition system and based on the first user utterance, a first speech model from among the plurality of speech models, the first speech model producing a first tentative recognition result corresponding to the first user utterance;
(c) informing the user of the first tentative recognition result;
(d) determining, by the automatic speech recognition system, from the user'"'"'s response whether the first tentative recognition result was correct;
(e) performing the following steps when the first tentative recognition result is not correct;
(i) requesting the user to repeat the response to the query;
(ii) extracting, by the automatic speech recognition system, a second user utterance from a sampled second input voice stream received from the user in response to the requesting step;
(iii) selecting, by the automatic speech recognition system and based on the second user utterance, a second speech model, different than the first speech model, the second speech model producing a second tentative recognition result corresponding to the second user utterance; and
(iv) determining, by a human operator, when the second speech model correctly corresponds to at least one of the first and second user utterances,wherein the first and second speech models are selected from a plurality of speech models.
27 Assignments
0 Petitions
Accused Products
Abstract
A system and method to improve the automatic adaptation of one or more speech models in automatic speech recognition systems. After a dialog begins, for example, the dialog asks the customer to provide spoken input and it is recorded. If the speech recognizer determines it may not have correctly transcribed the verbal response, i.e., voice input, the invention uses monitoring and if necessary, intervention to guarantee that the next transcription of the verbal response is correct. The dialog asks the customer to repeat his verbal response, which is recorded and a transcription of the input is sent to a human monitor, i.e., agent or operator. If the transcription of the spoken input is correct, the human does not intervene and the transcription remains unmodified. If the transcription of the verbal response is incorrect, the human intervenes and the transcription of the misrecognized word is corrected. In both cases, the dialog asks the customer to confirm the unmodified and corrected transcription. If the customer confirms the unmodified or newly corrected transcription, the dialog continues and the customer does not hang up in frustration because most times only one misrecognition occurred. Finally, the invention uses the first and second customer recording of the misrecognized word or utterance along with the corrected or unmodified transcription to automatically adapt one or more speech models, which improves the performance of the speech recognition system.
324 Citations
24 Claims
-
1. A method to retrain an automatic speech recognition system, in which automatic speech recognition system a plurality of speech models is stored, the method comprising:
-
(a) extracting, by the automatic speech recognition system, a first user utterance from a sampled first input voice stream received from a user in response to a query; (b) selecting, by the automatic speech recognition system and based on the first user utterance, a first speech model from among the plurality of speech models, the first speech model producing a first tentative recognition result corresponding to the first user utterance; (c) informing the user of the first tentative recognition result; (d) determining, by the automatic speech recognition system, from the user'"'"'s response whether the first tentative recognition result was correct; (e) performing the following steps when the first tentative recognition result is not correct; (i) requesting the user to repeat the response to the query; (ii) extracting, by the automatic speech recognition system, a second user utterance from a sampled second input voice stream received from the user in response to the requesting step; (iii) selecting, by the automatic speech recognition system and based on the second user utterance, a second speech model, different than the first speech model, the second speech model producing a second tentative recognition result corresponding to the second user utterance; and (iv) determining, by a human operator, when the second speech model correctly corresponds to at least one of the first and second user utterances, wherein the first and second speech models are selected from a plurality of speech models. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method to retrain an automatic speech recognition system, in which automatic speech recognition system a plurality of speech models is stored, the method comprising:
-
(a) extracting, by the automatic speech recognition system, a first user utterance from a first input voice stream from a user, the first user utterance being a response to a query; (b) selecting, by the automatic speech recognition system, a first speech model, the first speech model producing a first tentative recognition result based on the first user utterance; (c) determining, by the automatic speech recognition system, that the first tentative recognition result does not correctly characterize the first user utterance; (d) selecting, by a human operator and based on at least one of the first user utterance and a second user utterance received from the user, a second speech model as correctly characterizing the first user utterance, the second speech model producing a second tentative recognition result; and (e) retraining the first speech model using at least one of the first and second user utterances and the second tentative recognition result. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A speech recognition system comprising:
-
a speech recognition resource operable to extract a first user utterance from a first input voice stream from a user, the first user utterance being a response to a query;
select a first speech model producing a first tentative recognition result characterizing the first user utterance; anddetermine that the first tentative recognition result does not correctly characterize the first user utterance; a model adaptation agent operable, when the first tentative recognition result does not correctly characterize the first user utterance, to alert a human operator, based on the first user utterance, to select a second speech model, different than the first speech model, to produce a second tentative recognition result correctly characterizing the first user utterance, wherein the first and second speech models are selected from a plurality of speech models. - View Dependent Claims (20, 21, 22, 23, 24)
-
Specification