Inferring switching conditions for switching between modalities in a speech application environment extended for interactive text exchanges
First Claim
1. A method for allowing multimodal communication with a first device executing a speech-enabled application during a communication session with a user, comprising:
- with the first device, at a first time during the communication session, receiving a first signal from a voice server via a communication channel, the first signal corresponding to speech recognition results generated by the voice server of a voice input originating from a voice interface operated by the user during the communication session;
with the first device, processing the first signal with the speech-enabled application during the communication session with the user to generate a first responsive output;
with the first device, at a second time during the communication session with the user, receiving a second signal from the voice server via the communication channel, the second signal corresponding to a text input originating from a text interface operated by the user during the communication session, the voice server having received the text input via a communications network; and
with the first device, processing the second signal with the speech-enabled application during the communication session with the user to generate a second responsive output, and communicating a third signal corresponding to the second responsive output to the text interface via a communication path exclusive of the voice server.
3 Assignments
0 Petitions
Accused Products
Abstract
The disclosed solution includes a method for dynamically switching modalities based upon inferred conditions in a dialog session involving a speech application. The method establishes a dialog session between a user and the speech application. During the dialog session, the user interacts using an original modality and a second modality. The speech application interacts using a speech modality only. A set of conditions indicative of interaction problems using the original modality can be inferred. Responsive to the inferring step, the original modality can be changed to the second modality. A modality transition to the second modality can be transparent the speech application and can occur without interrupting the dialog session. The original modality and the second modality can be different modalities; one including a text exchange modality and another including a speech modality.
66 Citations
17 Claims
-
1. A method for allowing multimodal communication with a first device executing a speech-enabled application during a communication session with a user, comprising:
-
with the first device, at a first time during the communication session, receiving a first signal from a voice server via a communication channel, the first signal corresponding to speech recognition results generated by the voice server of a voice input originating from a voice interface operated by the user during the communication session; with the first device, processing the first signal with the speech-enabled application during the communication session with the user to generate a first responsive output; with the first device, at a second time during the communication session with the user, receiving a second signal from the voice server via the communication channel, the second signal corresponding to a text input originating from a text interface operated by the user during the communication session, the voice server having received the text input via a communications network; and with the first device, processing the second signal with the speech-enabled application during the communication session with the user to generate a second responsive output, and communicating a third signal corresponding to the second responsive output to the text interface via a communication path exclusive of the voice server. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system in which a user is able to engage in multiple modes of communication with a speech-enabled application, comprising:
-
a first device configured to execute the speech-enabled application so as to engage in a communication session with a user; a voice interface operable by the user during the communication session; and a text interface operable by the user during the communication session; wherein the system is configured such that the first device is able to receive, at a first time during the communication session, a first signal from a voice server via a communication channel, the first signal corresponding to speech recognition results generated by the voice server of a voice input originating from the voice interface, and to process the first signal with the speech-enabled application during the communication session with the user to generate a first responsive output; and wherein the system is further configured such that the first device is able to receive, at a second time during the communication session with the user, a second signal from the voice server via the communication channel, the second signal corresponding to a text input originating from the text interface operated by the user during the communication session, to process the second signal with the speech-enabled application during the communication session with the user to generate a second responsive output, and to communicate a third signal corresponding to the second responsive output to the text interface via a communication path exclusive of the voice server; and wherein the system is further configured such that the voice server is able to receive the text input via a communications network. - View Dependent Claims (8, 9, 10, 11, 12, 14, 15)
-
-
13. A system for allowing multimodal communication during a communication session with a user, comprising:
-
a device configured to execute a speech-enabled application; and a voice server comprising first means for, at a first time during the communication session, communicating a first signal corresponding to speech recognition results generated by the voice server of a voice input from a voice interface to an input of the device executing the speech-enabled application for processing thereby to generate a first responsive output, and second means for, at a second time during the communication session, receiving text input from a text interface via a communications network and communicating a second signal corresponding to the text input to the input of the device executing the speech-enabled application for processing thereby to generate a second responsive output; wherein the device configured to execute the speech-enabled application is further configured to communicate a third signal corresponding to the second responsive output to the text interface via a communication path exclusive of the voice server. - View Dependent Claims (16, 17)
-
Specification