Multi-modal content and automatic speech recognition in wireless telecommunication systems
First Claim
1. In a speech recognition capable wireless telecommunication system comprising a mobile client ( 300) in wireless communication with a proxy gateway (320), a speech recognition server (SRS, 360) that includes a speech recognizer (362) and a text-to-speech (TTS) synthesizer (368), a method of retrieval and delivery of multi-modal content (344) from a remotely located origin server for presentation and playback on said mobile client comprising the steps of:
- sending a request for a Web page from the client to the gateway;
retrieving the Web page from the origin server to the gateway;
returning the Web page to the client;
determining whether the Web page contains multi-modal components;
sending the multi-modal components from the client to the speech recognition server (360) using a wireless packet streaming protocol connection;
obtaining a grammar file or TTS markup strings by the speech recognition server (360) from a remotely located server using an established HTTP network connection (370) from URL references sent from the client;
loading the received grammars in the speech recognizer for performing speech recognition and TTS markup strings into the speech synthesizer for producing synthesized speech; and
returning speech recognition results from the speech recognizer and produced synthesized speech to the client over said wireless packet streaming protocol connection.
2 Assignments
0 Petitions
Accused Products
Abstract
A communication architecture for delivery of grammar and speech related information such as text-to-speech (TTS) data to a speech recognition server operating with a wireless telecommunication system for use with automatic speech recognition and interactive voice-based applications. In the invention, a mobile client retrieves a Web page containing multi-modal content hosted on a origin server via WAP gateway. The content may include a grammar file and/or TTS strings embedded in the content or reference URL(s) pointing to their storage locations. The client then sends the grammar and/or TTS strings to a speech recognition server via a wireless packet streaming protocol channel. When URL(s) are received by the client and sent to the SRS, the grammar file and/or TTS strings are obtained via a high speed HTTP connection. The speech processing results and the synthesized speech are returned to the client over the established wireless UDP connection.
145 Citations
16 Claims
-
1. In a speech recognition capable wireless telecommunication system comprising a mobile client ( 300) in wireless communication with a proxy gateway (320), a speech recognition server (SRS, 360) that includes a speech recognizer (362) and a text-to-speech (TTS) synthesizer (368), a method of retrieval and delivery of multi-modal content (344) from a remotely located origin server for presentation and playback on said mobile client comprising the steps of:
-
sending a request for a Web page from the client to the gateway;
retrieving the Web page from the origin server to the gateway;
returning the Web page to the client;
determining whether the Web page contains multi-modal components;
sending the multi-modal components from the client to the speech recognition server (360) using a wireless packet streaming protocol connection;
obtaining a grammar file or TTS markup strings by the speech recognition server (360) from a remotely located server using an established HTTP network connection (370) from URL references sent from the client;
loading the received grammars in the speech recognizer for performing speech recognition and TTS markup strings into the speech synthesizer for producing synthesized speech; and
returning speech recognition results from the speech recognizer and produced synthesized speech to the client over said wireless packet streaming protocol connection. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A wireless telecommunication system comprising a mobile client (300), a proxy gateway (320) in wireless communication with the mobile client, wherein said gateway hosts an HTTP network connection (330), and a speech recognition server (360) in wireless communication with the mobile client, the system being
characterized in that a wireless packet streaming protocol connection (354) is established between the mobile client and the speech recognition server for the transfer of audio related packet data, and wherein the speech recognition server possesses an HTTP network connection for retrieving grammar and text-to-speech information from a remotely located server.
-
11. A mobile client device comprising:
-
means for interfacing with a proxy gateway via a data protocol standard;
means for retrieving a Web page located on an origin server;
means for extracting multi-modal components from said Web page for transmission to a speech recognition server;
means for generating speech parameters for use with said speech recognition server; and
means for establishing a packet streaming protocol connection for wireless communication with said speech recognition server (SRS). - View Dependent Claims (12, 13, 14, 15, 16)
-
Specification