Method and system for considering information about an expected response when performing speech recognition
First Claim
1. A method for recognizing speech in a speech recognition system comprising the steps of:
- receiving input speech from a user in a speech dialog where there is at least one point in the speech dialog where there is a grammar of possible responses and at least one expected response, the at least one expected response being a subset of the grammar and known to be the at least one expected response in the speech recognition system before receiving the user input speech, the at least one expected response including the most likely response or responses expected to be uttered by the user at the at least one point in the speech dialog and having a model associated therewith;
generating acoustic features of the input speech;
directly comparing the input speech acoustic features only to the model that is associated with the at least one expected response, rather than to models associated with additional possible responses, to generate an associated confidence factor;
comparing the confidence factor to an acceptance threshold for accepting the expected response as the result of the speech recognition.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech recognition system receives and analyzes speech input from a user in order to recognize and accept a response from the user. Under certain conditions, information about the response expected from the user may be available. In these situations, the available information about the expected response is used to modify the behavior of the speech recognition system by taking this information into account. The modified behavior of the speech recognition system according to the invention has several embodiments including: comparing the observed speech features to the models of the expected response separately from the usual hypothesis search in order to speed up the recognition system; modifying the usual hypothesis search to emphasize the expected response; updating and adapting the models when the recognized speech matches the expected response to improve the accuracy of the recognition system.
173 Citations
18 Claims
-
1. A method for recognizing speech in a speech recognition system comprising the steps of:
-
receiving input speech from a user in a speech dialog where there is at least one point in the speech dialog where there is a grammar of possible responses and at least one expected response, the at least one expected response being a subset of the grammar and known to be the at least one expected response in the speech recognition system before receiving the user input speech, the at least one expected response including the most likely response or responses expected to be uttered by the user at the at least one point in the speech dialog and having a model associated therewith; generating acoustic features of the input speech; directly comparing the input speech acoustic features only to the model that is associated with the at least one expected response, rather than to models associated with additional possible responses, to generate an associated confidence factor; comparing the confidence factor to an acceptance threshold for accepting the expected response as the result of the speech recognition. - View Dependent Claims (2, 3, 4)
-
-
5. An apparatus for recognizing speech comprising:
-
circuitry for receiving input speech from a user as part of a speech dialog where there is at least one point in the speech dialog where there is a grammar of possible responses and at least one expected response, the at least one expected response being a subset of the grammar and known to be the at least one expected response in the speech recognition system before receiving the user input speech, the at least one expected response including the most likely response or responses expected to be uttered by the user at the at least one point in the speech dialog and having a model associated therewith, the circuitry configured for generating acoustic features of the input speech; processing circuitry operable for directly comparing the input speech acoustic features to the model that is associated with the at least one expected response, rather than to models associated with additional possible responses, to generate an associated confidence factor and for comparing the confidence factor to an acceptance threshold for accepting the expected response as the recognized speech. - View Dependent Claims (6, 7, 8)
-
-
9. A method for recognizing speech in a speech recognition system comprising the steps of:
-
in a speech dialog where there is a grammar of possible responses and at least one expected response, determining an expected response to be received as input speech from a user in at least one point in the speech dialog, the at least one expected response being a subset of the grammar and known to be the at least one expected response in the speech recognition system before receiving the user input speech, the at least one expected response including the most likely response or responses expected to be uttered by the user at the at least one point in the speech dialog and having a model associated therewith; based upon the determined expected response, modifying a speech recognition match/search algorithm with respect to the model that is associated with the expected response so that the match/search algorithm is configured to favor the expected response and boost a confidence factor associated with a hypothesis generated by the modified speech recognition match/search algorithm; processing input speech from a user using the modified match/search algorithm to generate a hypothesis and confidence factor for the input speech. - View Dependent Claims (10, 11, 12, 13)
-
-
14. An apparatus for recognizing speech comprising:
-
processing circuitry including a match/search algorithm for performing speech recognition in a speech recognition system having a speech dialog with a grammar of possible responses and at least one expected response, the expected response to be received in user input speech in at least one point in the speech dialog, the at least one expected response being a subset of the grammar and known to be the at least one expected response in the speech recognition system before receiving the user input speech, the at least one expected response including the most likely response or responses expected to be uttered by the user at the at least one point in the speech dialog and having a model associated therewith; the processing circuitry configured for modifying the speech recognition match/search algorithm with respect to the model that is associated with the expected response to be received as input speech from a user so that the match/search algorithm is configured to favor the expected response and boost a confidence factor associated with a hypothesis generated by the modified speech recognition match/search algorithm; the processing circuitry further being configured for processing input speech from a user using the modified match/search algorithm to generate a hypothesis and confidence factor for the input speech. - View Dependent Claims (15, 16, 17, 18)
-
Specification