Supporting multi-lingual user interaction with a multimodal application
DCFirst Claim
1. A method comprising:
- receiving a voice utterance from a user;
determining, using at least one speech engine operating on at least one processor and a plurality of grammars that each specifies a limited set of one or more acceptable inputs in a language of a plurality of languages, a plurality of speech recognition results for the voice utterance and a plurality of confidence levels, the at least one speech engine determining each of the plurality of speech recognition results by using at least one of the plurality of grammars and matching the voice utterance to the limited set of acceptable inputs identified by the at least one grammar of the plurality of grammars, each confidence level of the plurality of confidence levels corresponding to a respective speech recognition result of the plurality of speech recognition results and each of the plurality of speech recognition results corresponding to a respective language of the plurality of languages, wherein each of the plurality of confidence levels determined using the at least one speech engine indicates a confidence of the at least one speech engine that the voice utterance matches a matched input of the limited set of acceptable inputs identified by the at least one grammar used to determine the speech recognition result;
evaluating the plurality of confidence levels for the plurality of speech recognition results to determine a speech recognition result of the plurality of speech recognition results having a highest confidence level of the plurality of confidence levels determined by the at least one speech engine; and
selecting one of the plurality of languages for use in subsequently interacting with the user by selecting a language corresponding to the speech recognition result having the highest confidence level of the plurality of confidence levels determined by the at least one speech engine.
3 Assignments
Litigations
0 Petitions
Accused Products
Abstract
Methods, apparatus, and products are disclosed for supporting multi-lingual user interaction with a multimodal application, the application including a plurality of VoiceXML dialogs, each dialog characterized by a particular language, supporting multi-lingual user interaction implemented with a plurality of speech engines, each speech engine having a grammar and characterized by a language corresponding to one of the dialogs, with the application operating on a multimodal device supporting multiple modes of interaction including a voice mode and one or more non-voice modes, the application operatively coupled to the speech engines through a VoiceXML interpreter, the VoiceXML interpreter: receiving a voice utterance from a user; determining in parallel, using the speech engines, recognition results for each dialog in dependence upon the voice utterance and the grammar for each speech engine; administering the recognition results for the dialogs; and selecting a language for user interaction in dependence upon the administered recognition results.
-
Citations
21 Claims
-
1. A method comprising:
-
receiving a voice utterance from a user; determining, using at least one speech engine operating on at least one processor and a plurality of grammars that each specifies a limited set of one or more acceptable inputs in a language of a plurality of languages, a plurality of speech recognition results for the voice utterance and a plurality of confidence levels, the at least one speech engine determining each of the plurality of speech recognition results by using at least one of the plurality of grammars and matching the voice utterance to the limited set of acceptable inputs identified by the at least one grammar of the plurality of grammars, each confidence level of the plurality of confidence levels corresponding to a respective speech recognition result of the plurality of speech recognition results and each of the plurality of speech recognition results corresponding to a respective language of the plurality of languages, wherein each of the plurality of confidence levels determined using the at least one speech engine indicates a confidence of the at least one speech engine that the voice utterance matches a matched input of the limited set of acceptable inputs identified by the at least one grammar used to determine the speech recognition result; evaluating the plurality of confidence levels for the plurality of speech recognition results to determine a speech recognition result of the plurality of speech recognition results having a highest confidence level of the plurality of confidence levels determined by the at least one speech engine; and selecting one of the plurality of languages for use in subsequently interacting with the user by selecting a language corresponding to the speech recognition result having the highest confidence level of the plurality of confidence levels determined by the at least one speech engine. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. An apparatus comprising:
at least one computer processor programmed to; receive a voice utterance from a user; determine, using at least one speech engine and a plurality of grammars that each specifies a limited set of one or more acceptable inputs in a language of a plurality of languages, a plurality of speech recognition results for the voice utterance and a plurality of confidence levels, the at least one speech engine determining each of the plurality of speech recognition results by using at least one of the plurality of grammars and matching the voice utterance to the limited set of acceptable inputs identified by the at least one grammar of the plurality of grammars, each confidence level of the plurality of confidence levels corresponding to a respective speech recognition result of the plurality of speech recognition results and each of the plurality of speech recognition results corresponding to a respective language of the plurality of languages, wherein each of the plurality of confidence levels determined using the at least one speech engine indicates a confidence of the at least one speech engine that the voice utterance matches a matched input of the limited set of acceptable inputs identified by the at least one grammar used to determine the speech recognition result; evaluate the plurality of confidence levels for the plurality of speech recognition results to determine a speech recognition result of the plurality of speech recognition results having a highest confidence level of the plurality of confidence levels determined by the at least one speech engine; and select one of the plurality of languages for use in subsequently interacting with the user by selecting a language corresponding to the speech recognition result having the highest confidence level of the plurality of confidence levels determined by the at least one speech engine. - View Dependent Claims (10, 11, 12, 13, 14)
-
15. At least one non-transitory recordable medium encoded with computer-executable instructions that, when executed by a computer, cause the computer to carry out a method comprising:
-
receiving a voice utterance from a user; determining, using at least one speech engine and a plurality of grammars that each specifies a limited set of one or more acceptable inputs in a language of a plurality of languages, a plurality of speech recognition results for the voice utterance and a plurality of confidence levels, the at least one speech engine determining each of the plurality of speech recognition results by using at least one of the plurality of grammars and matching the voice utterance to the limited set of acceptable inputs identified by the at least one grammar of the plurality of grammars, each confidence level of the plurality of confidence levels corresponding to a respective speech recognition result of the plurality of speech recognition results and each of the plurality of speech recognition results corresponding to a respective language of the plurality of languages, wherein each of the plurality of confidence levels determined using the at least one speech engine indicates a confidence of the at least one speech engine that the voice utterance matches a matched input of the limited set of acceptable inputs identified by the at least one grammar used to determine the speech recognition result; evaluating the plurality of confidence levels for the plurality of speech recognition results to determine a speech recognition result of the plurality of speech recognition results having a highest confidence level of the plurality of confidence levels determined using the at least one speech engine; and selecting one of the plurality of languages for use in subsequently interacting with the user by selecting a language corresponding to the speech recognition result having the highest confidence level of the plurality of confidence levels determined using the at least one speech engine. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification