Voice control of a user interface to service applications
First Claim
1. A method of controlling a service application provided to a terminal from a remote server, the method comprising the steps of:
- receiving an audio input signal representing audio information;
using a first automatic speech recognition system located in the terminal to determine whether the audio input signal includes one or more words defined by a first vocabulary, wherein portions of the audio input signal not corresponding to the one or more words defined by the first vocabulary constitute an unrecognized portion of the audio input signal;
if the audio input signal includes one or more words defined by the first vocabulary, then using a terminal application part of an application protocol service logic to determine what to do with the one or more words defined by the first vocabulary;
formatting the unrecognized portion of the audio input signal for inclusion in a data unit whose structure is defined by a first predefined markup language;
communicating the data unit to a remote application part via a first digital data link that operates in accordance with a first application protocol; and
in the remote application part, extracting the formatted unrecognized portion of the audio input signal from the data unit and using a remote application part service logic to determine what to do with the formatted unrecognized portion of the audio input signal.
1 Assignment
0 Petitions
Accused Products
Abstract
Voice control of a service application provided to a terminal from a remote server is distributed between the terminal and a remote application part. A relatively low power automatic speech recognition system (ASR) is provided in the terminal for recognizing those portions of user-supplied audio input that relate to terminal functions or functions defined by a predefined markup language. Recognized words may be used to control the terminal functions, or may alternatively be converted to text and forwarded to the remote server. Unrecognized portions of the audio input may be encoded and forwarded to the remote application part which includes a more powerful ASR. The remote application part may use its ASR to recognize words defined by the application. Recognized words may be converted to text and supplied as input to the remote server. In the reverse direction, text received by the remote application part from the remote server may be converted to an encoded audio output signal, and forwarded to the terminal, which can then generate a signal to be supplied to a loudspeaker. In this way, a voice control mechanism is used in place of the remote server'"'"'s visual display output and keyboard input.
532 Citations
34 Claims
-
1. A method of controlling a service application provided to a terminal from a remote server, the method comprising the steps of:
-
receiving an audio input signal representing audio information;
using a first automatic speech recognition system located in the terminal to determine whether the audio input signal includes one or more words defined by a first vocabulary, wherein portions of the audio input signal not corresponding to the one or more words defined by the first vocabulary constitute an unrecognized portion of the audio input signal;
if the audio input signal includes one or more words defined by the first vocabulary, then using a terminal application part of an application protocol service logic to determine what to do with the one or more words defined by the first vocabulary;
formatting the unrecognized portion of the audio input signal for inclusion in a data unit whose structure is defined by a first predefined markup language;
communicating the data unit to a remote application part via a first digital data link that operates in accordance with a first application protocol; and
in the remote application part, extracting the formatted unrecognized portion of the audio input signal from the data unit and using a remote application part service logic to determine what to do with the formatted unrecognized portion of the audio input signal. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
a current menu item is associated with a first selection; and
the one or more terminal functions include associating the current menu item with a second selection that is not the same as the first selection.
-
-
6. The method of claim 1, wherein if the audio input signal includes one or more words defined by the first vocabulary, then the terminal application part of the application protocol service logic causes a corresponding message to be generated and communicated to the remote application part via the first digital data link.
-
7. The method of claim 6, wherein the corresponding message includes state information.
-
8. The method of claim 6, wherein the corresponding message includes text.
-
9. The method of claim 6, wherein the corresponding message includes binary data.
-
10. The method of claim 6, wherein the remote application part forwards the corresponding message to the remote server.
-
11. The method of claim 10, wherein the remote application part forwards the corresponding message to the remote server via a second digital data link that operates in accordance with a second application protocol.
-
12. The method of claim 11, wherein the first application protocol is the same as the second application protocol.
-
13. The method of claim 1, further comprising the steps of:
-
using a second automatic speech recognition system located in the remote application part to determine whether the unrecognized portion of the audio input signal includes one or more words defined by a second vocabulary; and
if the unrecognized portion of the audio input signal includes one or more words defined by the second vocabulary, then using the remote application part service logic to determine what to do with the one or more words defined by the second vocabulary.
-
-
14. The method of claim 13, wherein:
-
the first vocabulary exclusively includes words defined by a syntax of the first predefined markup language; and
the second vocabulary exclusively includes words associated with the remote server.
-
-
15. The method of claim 13, wherein if the unrecognized portion of the audio input signal includes one or more words defined by the second vocabulary, then the remote application part service logic causes a corresponding keyboard emulation response to be generated and sent to the remote server.
-
16. The method of claim 13, wherein if the unrecognized portion of the audio input signal includes one or more words defined by the second vocabulary, then the remote application part service logic causes a remote application part service logic state to be changed.
-
17. The method of claim 1, further comprising the steps of:
-
in the remote application part, receiving text from the remote server;
in the remote application part, generating a corresponding audio output signal representing audio information;
formatting the audio output signal for inclusion in a second data unit whose structure is defined by the first predefined markup language;
communicating the second data unit to the terminal via the first digital data link; and
in the terminal, extracting the audio output signal from the second data unit and generating therefrom a loudspeaker signal.
-
-
18. An apparatus for controlling a service application provided to a terminal from a remote server, the apparatus comprising:
-
means for receiving an audio input signal representing audio information;
a first automatic speech recognition system located in the terminal for determining whether the audio input signal includes one or more words defined by a first vocabulary, wherein portions of the audio input signal not corresponding to the one or more words defined by the first vocabulary constitute an unrecognized portion of the audio input signal;
a terminal application part of an application protocol service logic for determining what to do with the one or more words defined by the first vocabulary if the audio input signal includes one or more words defined by the first vocabulary;
means for formatting the unrecognized portion of the audio input signal for inclusion in a data unit whose structure is defined by a first predefined markup language;
means for communicating the data unit to a remote application part via a first digital data link that operates in accordance with a first application protocol; and
the remote application part, comprising;
means for extracting the formatted unrecognized portion of the audio input signal from the data unit; and
a remote application part service logic for determining what to do with the formatted unrecognized portion of the audio input signal. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
means for causing the one or more words to be used to select one or more terminal functions to be performed if the audio input signal includes one or more words defined by the first vocabulary.
-
-
21. The apparatus of claim 20, wherein the one or more terminal functions include selecting a current menu item as a response to be supplied to the remote server.
-
22. The apparatus of claim 20, wherein:
-
a current menu item is associated with a first selection; and
the one or more terminal functions include associating the current menu item with a second selection that is not the same as the first selection.
-
-
23. The apparatus of claim 18, wherein the terminal application part of the application protocol service logic comprises:
means for causing a corresponding message to be generated and communicated to the remote application part via the first digital data link if the audio input signal includes one or more words defined by the first vocabulary.
-
24. The apparatus of claim 23, wherein the corresponding message includes state information.
-
25. The apparatus of claim 23, wherein the corresponding message includes text.
-
26. The apparatus of claim 23, wherein the corresponding message includes binary data.
-
27. The apparatus of claim 23, wherein the remote application part includes means for forwarding the corresponding message to the remote server.
-
28. The apparatus of claim 27, wherein the remote application part includes means for forwarding the corresponding message to the remote server via a second digital data link that operates in accordance with a second application protocol.
-
29. The apparatus of claim 28, wherein the first application protocol is the same as the second application protocol.
-
30. The apparatus of claim 18, further comprising:
-
a second automatic speech recognition system located in the remote application part for determining whether the unrecognized portion of the audio input signal includes one or more words defined by a second vocabulary, and wherein the remote application part service logic includes means for determining what to do with the one or more words defined by the second vocabulary if the unrecognized portion of the audio input signal includes one or more words defined by the second vocabulary.
-
-
31. The apparatus of claim 30, wherein:
-
the first vocabulary exclusively includes words defined by a syntax of the first predefined markup language; and
the second vocabulary exclusively includes words associated with the remote server.
-
-
32. The apparatus of claim 30, wherein the remote application part service logic comprises:
means for causing a corresponding keyboard emulation response to be generated and sent to the remote server if the unrecognized portion of the audio input signal includes one or more words defined by the second vocabulary.
-
33. The apparatus of claim 30, wherein the remote application part service logic comprises:
means for causing a remote application part service logic state to be changed if the unrecognized portion of the audio input signal includes one or more words defined by the second vocabulary.
-
34. The apparatus of claim 18, further comprising:
-
means, in the remote application part, for receiving text from the remote server;
means, in the remote application part, for generating a corresponding audio output signal representing audio information;
means for formatting the audio output signal for inclusion in a second data unit whose structure is defined by the first predefined markup language;
means for communicating the second data unit to the terminal via the first digital data link; and
means, in the terminal, for extracting the audio output signal from the second data unit and generating therefrom a loudspeaker signal.
-
Specification