Methods and apparatus for implementing distributed multi-modal applications
First Claim
1. A method performed by a client device, the method comprising the steps of:
- rendering a visual display that includes at least one multi-modal display element for which input data is receivable by the client device through a visual modality and a voice modality, wherein the client device maintains knowledge of a visual view focus, which initially is set to a first multi-modal display element of the at least one multi-modal display element;
sending a first voice event request to an application server to establish a connection between the client device and the application server, wherein the first voice event request is an asynchronous hypertext transfer protocol (HTTP) request that will remain pending at the application server until a voice event occurs so that the connection remains established;
after sending the first voice event request, receiving an audio signal that may represent a user utterance via the voice modality;
sending uplink audio data representing the audio signal to a speech recognizer that interprets the uplink audio data based on a voice view focus, wherein the voice view focus initially is set to a portion of a speech dialog associated with the first multi-modal display element;
receiving a voice event response from the application server in response to the first voice event request and in response to the application server having received an indication that the voice event has occurred;
in response to receiving the voice event response, updating the visual view focus to a new visual view focus; and
sending a second voice event request to the application server in response to receiving the voice event response, wherein the second voice event request will remain pending at the application server until a second voice event occurs.
4 Assignments
0 Petitions
Accused Products
Abstract
Embodiments include methods and apparatus for synchronizing data and focus between visual and voice views associated with distributed multi-modal applications. An embodiment includes a client device adapted to render a visual display that includes at least one multi-modal display element for which input data is receivable though a visual modality and a voice modality. When the client detects a user utterance via the voice modality, the client sends uplink audio data representing the utterance to a speech recognizer. An application server receives a speech recognition result generated by the speech recognizer, and sends a voice event response to the client. The voice event response is sent as a response to an asynchronous HTTP voice event request previously sent to the application server by the client. The client may then send another voice event request to the application server in response to receiving the voice event response.
23 Citations
21 Claims
-
1. A method performed by a client device, the method comprising the steps of:
-
rendering a visual display that includes at least one multi-modal display element for which input data is receivable by the client device through a visual modality and a voice modality, wherein the client device maintains knowledge of a visual view focus, which initially is set to a first multi-modal display element of the at least one multi-modal display element; sending a first voice event request to an application server to establish a connection between the client device and the application server, wherein the first voice event request is an asynchronous hypertext transfer protocol (HTTP) request that will remain pending at the application server until a voice event occurs so that the connection remains established; after sending the first voice event request, receiving an audio signal that may represent a user utterance via the voice modality; sending uplink audio data representing the audio signal to a speech recognizer that interprets the uplink audio data based on a voice view focus, wherein the voice view focus initially is set to a portion of a speech dialog associated with the first multi-modal display element; receiving a voice event response from the application server in response to the first voice event request and in response to the application server having received an indication that the voice event has occurred; in response to receiving the voice event response, updating the visual view focus to a new visual view focus; and sending a second voice event request to the application server in response to receiving the voice event response, wherein the second voice event request will remain pending at the application server until a second voice event occurs. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method performed by an application server, the method comprising the steps of:
-
receiving, from a client device that has rendered a visual display that includes at least one multi-modal display element for which input data is receivable by the client device though a visual modality and a voice modality, a first voice event request to establish a connection between the client device and the application server, wherein the first voice event request is an asynchronous hypertext transfer protocol (HTTP) request that will remain pending at the application server until a voice event occurs so that the connection remains established; after the first voice event request is received, receiving a speech recognition result from a voice server, wherein the speech recognition result represents a result of a speech recognition process performed on uplink audio data sent by the client device to a speech recognizer that interprets the uplink audio data based on a voice view focus, wherein the voice view focus initially is set to a portion of a speech dialog associated with a first multi-modal display element of the at least one multi-modal display element; sending a voice event response to the client device in response to the first voice event request and in response to the application server having received the speech recognition result, wherein the voice event response causes the client device to update a visual view focus; and receiving a second voice event request from the client device in response to sending the voice event response, wherein the second voice event request will remain pending at the application server until a second voice event occurs. - View Dependent Claims (12, 13, 14, 15)
-
-
16. A system comprising:
a client device adapted to render a visual display that includes at least one multi-modal display element for which input data is receivable by the client device though a visual modality and a voice modality, wherein the client device maintains knowledge of a visual view focus which initially is set to a first multi-modal display element of the at least one multi-modal display element, send a first voice event request to an application server to establish a connection between the client device and the application server before an audio signal is received via the voice modality, wherein the first voice event request is an asynchronous hypertext transfer protocol (HTTP) request that will remain pending at the application server until a voice event occurs so that the connection remains established, receive an audio signal that may represent a user utterance via the voice modality, send uplink audio data representing the audio signal to a speech recognizer that interprets the uplink audio data based on a voice view focus, wherein the voice view focus initially is set to a portion of a speech dialog associated with the first multi-modal display element, receive a voice event response from the application server in response to the first voice event request and in response to the application server having received an indication that the voice event has occurred, in response to receiving the voice event response, update the visual view focus to a new visual view focus, and send a second voice event request to the application server in response to receiving the voice event response, wherein the second voice event request will remain pending at the application server until a second voice event occurs. - View Dependent Claims (17, 18, 19, 20, 21)
Specification