Methods and Apparatus for Implementing Distributed Multi-Modal Applications
First Claim
1. A method performed by an application server, the method comprising the steps of:
- receiving, over an application server/voice server control path between the application server and a voice server, an indication from the voice server that speech has been recognized based on uplink audio data sent from a client device to the voice server over an audio data path between the client device and the voice server, wherein the uplink audio data represents a user utterance received through a voice modality of the client device, and wherein the voice server is distinct from the application server; and
sending, over an application server/client control path between the application server and the client device, a message to the client device that includes a recognition result for the speech and that causes the client device to update a visual display to reflect the recognition result.
4 Assignments
0 Petitions
Accused Products
Abstract
Embodiments of a system include a client device (102), a voice server (106), and an application server (104). The voice server is distinct from the application server. The client device renders (316) a visual display that includes at least one display element for which input data is receivable though a visual modality and a voice modality. The client device may receive speech through the voice modality and send (502) uplink audio data representing the speech to the voice server over an audio data path (124). The application server receives (514) a speech recognition result from the voice server over an application server/voice server control path (122). The application server sends (514), over an application server/client control path (120), a message to the client device that includes the speech recognition result. The client device updates (516) one or more of the display elements according to the speech recognition result.
-
Citations
20 Claims
-
1. A method performed by an application server, the method comprising the steps of:
-
receiving, over an application server/voice server control path between the application server and a voice server, an indication from the voice server that speech has been recognized based on uplink audio data sent from a client device to the voice server over an audio data path between the client device and the voice server, wherein the uplink audio data represents a user utterance received through a voice modality of the client device, and wherein the voice server is distinct from the application server; and sending, over an application server/client control path between the application server and the client device, a message to the client device that includes a recognition result for the speech and that causes the client device to update a visual display to reflect the recognition result. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method performed by a client device, the method comprising the steps of:
-
rendering a visual display based on interpretation of machine code that causes the client device to render the visual display, wherein the visual display includes at least one display element for which input data is receivable by the client device though a visual modality and a voice modality; receiving a signal representing a user utterance through the voice modality; digitizing the signal to generate uplink audio data corresponding to one or more display elements of the at least one display element; sending the uplink audio data to a voice server over an audio data path between the client device and the voice server; receiving a speech recognition result from an application server over an application server/client control path between the application server and the client device, wherein the speech recognition result is based on the voice server having performed a speech recognition process on the uplink audio data, and wherein the audio data path is distinct from the application server/client control path, and wherein the voice server is distinct from the application server; and updating the one or more display elements of the visual display according to the speech recognition result. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. A system comprising:
-
a client device adapted to display at least one display element for which input data is receivable though a visual modality and a voice modality and, when the input data is received through the voice modality as speech, to send uplink audio data representing the speech to a voice server over an audio data path between the client device and the voice server; the voice server adapted to determine, based on the uplink audio data, whether the speech is recognized, and when the speech is recognized, to send an indication that the speech is recognized to an application server over an application server/voice server control path between the application server and the voice server; and the application server adapted to receive the indication that the speech was recognized, and based on the indication, to send a speech recognition result to the client device over an application server/client control path between the application server and the client device, wherein the application server/client control path is distinct from the audio data path. - View Dependent Claims (18, 19, 20)
-
Specification