Methods and Apparatus for Implementing Distributed Multi-Modal Applications

US 20090171669A1
Filed: 12/17/2008
Published: 07/02/2009
Est. Priority Date: 12/31/2007
Status: Active Grant

First Claim

Patent Images

1. A method performed by an application server, the method comprising the steps of:

receiving, over an application server/voice server control path between the application server and a voice server, an indication from the voice server that speech has been recognized based on uplink audio data sent from a client device to the voice server over an audio data path between the client device and the voice server, wherein the uplink audio data represents a user utterance received through a voice modality of the client device, and wherein the voice server is distinct from the application server; and

sending, over an application server/client control path between the application server and the client device, a message to the client device that includes a recognition result for the speech and that causes the client device to update a visual display to reflect the recognition result.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Embodiments of a system include a client device (102), a voice server (106), and an application server (104). The voice server is distinct from the application server. The client device renders (316) a visual display that includes at least one display element for which input data is receivable though a visual modality and a voice modality. The client device may receive speech through the voice modality and send (502) uplink audio data representing the speech to the voice server over an audio data path (124). The application server receives (514) a speech recognition result from the voice server over an application server/voice server control path (122). The application server sends (514), over an application server/client control path (120), a message to the client device that includes the speech recognition result. The client device updates (516) one or more of the display elements according to the speech recognition result.

Citations

20 Claims

1. A method performed by an application server, the method comprising the steps of:
- receiving, over an application server/voice server control path between the application server and a voice server, an indication from the voice server that speech has been recognized based on uplink audio data sent from a client device to the voice server over an audio data path between the client device and the voice server, wherein the uplink audio data represents a user utterance received through a voice modality of the client device, and wherein the voice server is distinct from the application server; and
  
  sending, over an application server/client control path between the application server and the client device, a message to the client device that includes a recognition result for the speech and that causes the client device to update a visual display to reflect the recognition result.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, further comprising the step of:
    - establishing the audio data path byreceiving client audio path information from the client device over the application server/client control path, wherein the client audio path information includes address information for the voice server to send downlink audio data to the client device over the audio data path;
      
      receiving voice server audio path information from the voice server over the application server/voice server control path, wherein the voice server audio path information includes address information for the client device to send the uplink audio data to the voice server over the audio data path;
      
      sending the client audio path information to the voice server over the application server/voice server control path; and
      
      sending the voice server audio path information to the client device over the application server/client control path.
  - 3. The method of claim 1, further comprising:
    - sending a multi-modal page to the client device over the application server/client control path, wherein the multi-modal page, when interpreted, causes the client device to render the visual display that includes at least one display element for which input data is receivable by the client device though a visual modality and the voice modality.
  - 4. The method of claim 1, further comprising sending a reference to a speech dialog to the voice server over the application server/voice server control path.
  - 5. The method of claim 1, further comprising sending a speech dialog to the voice server over the application server/voice server control path.
  - 6. The method of claim 1, further comprising the steps of:
    - receiving an indication from the client device, over the application server/client control path, that the client device has initiated interpretation of machine code that causes the client device to render the visual display that includes at least one display element for which input data is receivable by the client device though a visual modality and the voice modality; and
      
      sending an instruction to the voice server, over the application server/voice server control path, for the voice server to begin interpreting a speech dialog associated with the machine code being interpreted by the client device.
  - 7. The method of claim 1, further comprising the steps of:
    - receiving an indication from the client device, over the application server/client control path, that the client device has updated the visual display according to the recognition result; and
      
      sending a message to the voice server, over the application server/voice server control path, to indicate that the client device has updated the visual display.
  - 8. The method of claim 1, further comprising the steps of:
    - receiving an indication from the client device, over the application server/client control path, that a current focus within the visual display rendered on the client device has changed to a different focus, wherein the different focus indicates a display element of the visual display for which input data currently is receivable by the client device though a visual modality and a voice modality; and
      
      sending a message to the voice server, over the application server/voice server control path, which includes information that will cause the voice server to execute machine code corresponding to the different focus.
  - 9. The method of claim 1, further comprising the steps of:
    - receiving an indication from the client device, over the application server/client control path, that a client-generated event has occurred which warrants an update to the visual display rendered on the client device;
      
      sending information to the client device, over the application server/client control path, to cause the client device to update the visual display based on the client-generated event; and
      
      sending an instruction to the voice server, over the application server/voice server control path, which includes information that indicates the client-generated event.

10. A method performed by a client device, the method comprising the steps of:
- rendering a visual display based on interpretation of machine code that causes the client device to render the visual display, wherein the visual display includes at least one display element for which input data is receivable by the client device though a visual modality and a voice modality;
  
  receiving a signal representing a user utterance through the voice modality;
  
  digitizing the signal to generate uplink audio data corresponding to one or more display elements of the at least one display element;
  
  sending the uplink audio data to a voice server over an audio data path between the client device and the voice server;
  
  receiving a speech recognition result from an application server over an application server/client control path between the application server and the client device, wherein the speech recognition result is based on the voice server having performed a speech recognition process on the uplink audio data, and wherein the audio data path is distinct from the application server/client control path, and wherein the voice server is distinct from the application server; and
  
  updating the one or more display elements of the visual display according to the speech recognition result.
- View Dependent Claims (11, 12, 13, 14, 15, 16)
- - 11. The method of claim 10, further comprising:
    - receiving a multi-modal page from the application server over the application server/client control path, wherein the multi-modal page includes the machine code, and wherein rendering the visual display is performed by interpreting machine code in the form of markup within the multi-modal page.
  - 12. The method of claim 10, further comprising:
    - receiving downlink audio data from the voice server over the audio data path, wherein the downlink audio data includes an audio prompt; and
      
      rendering the audio prompt on an audio output device of the client device.
  - 13. The method of claim 10, further comprising:
    - sending client audio path information to the application server over the application server/client control path, wherein the client audio path information includes address information for the voice server to send downlink audio data to the client device over the audio data path; and
      
      receiving voice server audio path information from the application server over the application server/client control path, wherein the voice server audio path information includes address information for the client device to send the uplink audio data to the voice server over the audio data path.
  - 14. The method of claim 10, further comprising:
    - receiving a user input that warrants an update to the visual display rendered on the client device;
      
      based on receiving the user input, sending an indication to the application server, over the application server/client control path, that a client-generated event has occurred; and
      
      receiving information from the application server, over the application server/client control path, which causes the client device to update the visual display based on the client-generated event.
  - 15. The method of claim 14, wherein receiving the user input comprises:
    - receiving an indication that the user has selected another display element that is different from a display element upon which the visual view currently is focused.
  - 16. The method of claim 10, further comprising:
    - receiving an indication that the user has entered text into a data entry field for one or more display elements using a keypad of the client device.

17. A system comprising:
- a client device adapted to display at least one display element for which input data is receivable though a visual modality and a voice modality and, when the input data is received through the voice modality as speech, to send uplink audio data representing the speech to a voice server over an audio data path between the client device and the voice server;
  
  the voice server adapted to determine, based on the uplink audio data, whether the speech is recognized, and when the speech is recognized, to send an indication that the speech is recognized to an application server over an application server/voice server control path between the application server and the voice server; and
  
  the application server adapted to receive the indication that the speech was recognized, and based on the indication, to send a speech recognition result to the client device over an application server/client control path between the application server and the client device, wherein the application server/client control path is distinct from the audio data path.
- View Dependent Claims (18, 19, 20)
- - 18. The system of claim 17, wherein the application server is further adapted to receive first audio path information from the client device over the application server/client control path and to receive second audio path information from the voice server over the application server/voice server control path, wherein the first audio path information includes a client internet protocol (IP) address to be used for downlink audio data sent to the client device from the voice server over the audio data path, and wherein the second audio path information includes a voice server IP address to be used for the uplink audio data sent to the voice server from the client device over the audio data path, and wherein the application server is further adapted to initiate establishment of the audio data path by sending the first audio path information to the voice server over the application server/voice server control path, and by sending the second audio path information to the client device over the application server/client control path.
  - 19. The system of claim 17, wherein the client device is a device selected from a group of devices that includes a cellular telephone, a radio, a pager, a personal data assistant, a personal navigation device, a mobile computer system, an automotive computer system, an airplane computer system, a compute, a laptop computer, a notebook computer, a desktop computer, and a voice over internet protocol (VoIP) phone implemented on a computer.
  - 20. The system of claim 17, wherein the application server and the voice server are distinct from each other in that the application server and the voice server perform distinct processes, and exchange control messages that affect performance of the distinct processes over the application server/voice server control path.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google Technology Holdings LLC (Alphabet Inc.)
Original Assignee
Motorola, Inc. (Motorola Solutions, Inc.)
Inventors
Ennai, Anuraj Kunnummel, Ferrans, James C., Engelsma, Jonathan R.

Granted Patent

US 8,386,260 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/275
CPC Class Codes

G10L 15/22 Procedures used during a sp...

H04M 3/4938 comprising a voice browser ...

Methods and Apparatus for Implementing Distributed Multi-Modal Applications

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Methods and Apparatus for Implementing Distributed Multi-Modal Applications

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links