Methods and systems for considering information about an expected response when performing speech recognition
First Claim
1. A method for improving a speech recognition system comprising the steps of:
- initiating a speech dialog with at least one point in the dialog where there is a grammar of possible responses and a set of at least one expected response;
wherein the set is a subset of the grammar and the set includes the most likely response or responses expected to be uttered by a user at the at least one point in the speech dialog, the set of at least one expected response for the at least one point being known in the speech recognition system before receiving input speech from the user;
progressing through the speech dialog until arriving at the at least one point;
receiving input speech from the user;
generating acoustic features of the input speech using an apparatus with at least one hardware-implemented processor;
comparing the generated input speech acoustic features to acoustic models associated with words in the grammar to generate a hypothesis;
comparing the hypothesis with at least one expected response in the set to determine if the hypothesis matches the at least one expected response in the set;
if the hypothesis matches the at least one expected response in the set, adapting at least one acoustic model corresponding to the matched expected response using the acoustic features of the input speech to use the at least one adapted model with future input speech in the speech recognition system, otherwise, not adapting the at least one acoustic model corresponding to the expected response.
3 Assignments
0 Petitions
Accused Products
Abstract
A speech recognition system receives and analyzes speech input from a user in order to recognize and accept a response from the user. Under certain conditions, information about the response expected from the user may be available. In these situations, the available information about the expected response is used to modify the behavior of the speech recognition system by taking this information into account. The modified behavior of the speech recognition system according to the invention has several embodiments including: comparing the observed speech features to the models of the expected response separately from the usual hypothesis search in order to speed up the recognition system; modifying the usual hypothesis search to emphasize the expected response; updating and adapting the models when the recognized speech matches the expected response to improve the accuracy of the recognition system.
-
Citations
28 Claims
-
1. A method for improving a speech recognition system comprising the steps of:
-
initiating a speech dialog with at least one point in the dialog where there is a grammar of possible responses and a set of at least one expected response; wherein the set is a subset of the grammar and the set includes the most likely response or responses expected to be uttered by a user at the at least one point in the speech dialog, the set of at least one expected response for the at least one point being known in the speech recognition system before receiving input speech from the user; progressing through the speech dialog until arriving at the at least one point; receiving input speech from the user; generating acoustic features of the input speech using an apparatus with at least one hardware-implemented processor; comparing the generated input speech acoustic features to acoustic models associated with words in the grammar to generate a hypothesis; comparing the hypothesis with at least one expected response in the set to determine if the hypothesis matches the at least one expected response in the set; if the hypothesis matches the at least one expected response in the set, adapting at least one acoustic model corresponding to the matched expected response using the acoustic features of the input speech to use the at least one adapted model with future input speech in the speech recognition system, otherwise, not adapting the at least one acoustic model corresponding to the expected response. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. An apparatus for recognizing speech and implementing a speech recognition function, the apparatus comprising:
-
circuitry for initiating a speech dialog with at least one point in the dialog where there is a grammar of possible responses and a set of at least one expected response and wherein the set is a subset of the grammar and the set includes the most likely response or responses expected to be uttered by a user at the at least one point in the speech dialog, the set of at least one expected response for the at least one point being known in the speech recognition system before receiving input speech from the user; circuitry operable for receiving input speech from the user for progressing through the speech dialog; circuitry configured for generating acoustic features of the input speech received from a user; processing circuitry including a match/search algorithm having acoustic models, the acoustic models including acoustic models that are associated with the set of at least one expected response; the processing circuitry operable for comparing the generated input speech acoustic features to acoustic models associated with words in the grammar to generate a hypothesis and further operable for comparing the hypothesis with at least one expected response in the set to determine if the hypothesis matches the at least one expected response in the set; the processing circuitry further operable, if the hypothesis matches the at least one expected response in the set to adapt at least one acoustic model corresponding to the matched expected response using the acoustic features of the input speech to use the at least one adapted model with future input speech in the speech recognition system, otherwise, not adapting the at least one acoustic model corresponding to the expected response. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
Specification