Multi-modal content and automatic speech recognition in wireless telecommunication systems
First Claim
1. A method comprising:
- sending a request for a Web page from a mobile client to a gateway, wherein the mobile client is in wireless communication with the gateway;
retrieving the Web page from an origin server to the gateway;
returning the Web page to the client;
determining whether the Web page contains multi-modal components;
sending a multi-modal components from the client to a speech recognition server using a direct wireless packet streaming protocol connection, wherein the speech recognition server includes a speech recognizer and a text-to-speech synthesizer;
obtaining a grammar file or text-to-speech markup strings by the speech recognition server from a remotely located server using an established hypertext transport protocol network connection from at least one universal resource locator references sent from the client;
loading the received grammars in a speech recognizer for performing speech recognition and text-to-speech markup strings into the speech synthesizer for producing synthesized speech; and
returning speech recognition results from the speech recognizer and produced synthesized speech to the client over said wireless packet streaming protocol connection.
2 Assignments
0 Petitions
Accused Products
Abstract
A communication architecture for delivery of grammar and speech related information such as text-to-speech (TTS) data to a speech recognition server operating with a wireless telecommunication system for use with automatic speech recognition and interactive voice-based applications. In the invention, a mobile client retrieves a Web page containing multi-modal content hosted on a origin server via WAP gateway. The content may include a grammar file and/or TTS strings embedded in the content or reference URL(s) pointing to their storage locations. The client then sends the grammar and/or TTS strings to a speech recognition server via a wireless packet streaming protocol channel. When URL(s) are received by the client and sent to the SRS, the grammar file and/or TTS strings are obtained via a high speed HTTP connection. The speech processing results and the synthesized speech are returned to the client over the established wireless UDP connection.
-
Citations
15 Claims
-
1. A method comprising:
-
sending a request for a Web page from a mobile client to a gateway, wherein the mobile client is in wireless communication with the gateway; retrieving the Web page from an origin server to the gateway; returning the Web page to the client; determining whether the Web page contains multi-modal components; sending a multi-modal components from the client to a speech recognition server using a direct wireless packet streaming protocol connection, wherein the speech recognition server includes a speech recognizer and a text-to-speech synthesizer; obtaining a grammar file or text-to-speech markup strings by the speech recognition server from a remotely located server using an established hypertext transport protocol network connection from at least one universal resource locator references sent from the client; loading the received grammars in a speech recognizer for performing speech recognition and text-to-speech markup strings into the speech synthesizer for producing synthesized speech; and returning speech recognition results from the speech recognizer and produced synthesized speech to the client over said wireless packet streaming protocol connection. - View Dependent Claims (2, 3, 4, 5)
-
-
6. An apparatus comprising:
-
a processor configured to control operations of the apparatus; and memory storing executable instructions that, when executed by the processor, cause the apparatus to perform; interfacing with a proxy gateway via a data protocol standard to retrieve a Web page located on an origin server; extracting multi-modal components from said Web page for transmission to a speech recognition server; generating speech parameters for use with said speech recognition sewer; establishing a direct wireless packet streaming protocol connection for wireless communication with said speech recognition server; sending the multi-modal components to the speech recognition server using the established direct wireless packet streaming protocol connection; and receiving speech recognition results and produced synthesized speech from the speech recognition server via said wireless packet streaming protocol connection. - View Dependent Claims (7, 8, 9, 10)
-
-
11. An apparatus, comprising:
-
a processor configured to control operations of the apparatus; and memory storing executable instructions that, when executed by the processor, cause the apparatus to perform; establishing a direct wireless packet streaming protocol connection with a mobile client; receiving multi-modal components from the client via the established direct wireless packet streaming protocol connection; obtaining a grammar file or text-to-speech markup strings from a remotely located server using an established hypertext transport protocol network connection from at least one universal resource locator reference sent from the mobile client; loading the received grammars in a speech recognizer for performing speech recognition and text-to-speech markup strings into the speech synthesizer for producing synthesized speech; and returning speech recognition results from the speech recognizer and produced synthesized speech to the mobile client over the said wireless packet streaming protocol connection. - View Dependent Claims (12, 13, 14, 15)
-
Specification