Multi-modal content and automatic speech recognition in wireless telecommunication systems

US 20030161298A1
Filed: 02/27/2003
Published: 08/28/2003
Est. Priority Date: 08/30/2000
Status: Active Grant

First Claim

Patent Images

1. In a speech recognition capable wireless telecommunication system comprising a mobile client ( 300) in wireless communication with a proxy gateway (320), a speech recognition server (SRS, 360) that includes a speech recognizer (362) and a text-to-speech (TTS) synthesizer (368), a method of retrieval and delivery of multi-modal content (344) from a remotely located origin server for presentation and playback on said mobile client comprising the steps of:

sending a request for a Web page from the client to the gateway;

retrieving the Web page from the origin server to the gateway;

returning the Web page to the client;

determining whether the Web page contains multi-modal components;

sending the multi-modal components from the client to the speech recognition server (360) using a wireless packet streaming protocol connection;

obtaining a grammar file or TTS markup strings by the speech recognition server (360) from a remotely located server using an established HTTP network connection (370) from URL references sent from the client;

loading the received grammars in the speech recognizer for performing speech recognition and TTS markup strings into the speech synthesizer for producing synthesized speech; and

returning speech recognition results from the speech recognizer and produced synthesized speech to the client over said wireless packet streaming protocol connection.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A communication architecture for delivery of grammar and speech related information such as text-to-speech (TTS) data to a speech recognition server operating with a wireless telecommunication system for use with automatic speech recognition and interactive voice-based applications. In the invention, a mobile client retrieves a Web page containing multi-modal content hosted on a origin server via WAP gateway. The content may include a grammar file and/or TTS strings embedded in the content or reference URL(s) pointing to their storage locations. The client then sends the grammar and/or TTS strings to a speech recognition server via a wireless packet streaming protocol channel. When URL(s) are received by the client and sent to the SRS, the grammar file and/or TTS strings are obtained via a high speed HTTP connection. The speech processing results and the synthesized speech are returned to the client over the established wireless UDP connection.

145 Citations

16 Claims

1. In a speech recognition capable wireless telecommunication system comprising a mobile client ( 300) in wireless communication with a proxy gateway (320), a speech recognition server (SRS, 360) that includes a speech recognizer (362) and a text-to-speech (TTS) synthesizer (368), a method of retrieval and delivery of multi-modal content (344) from a remotely located origin server for presentation and playback on said mobile client comprising the steps of:
- sending a request for a Web page from the client to the gateway;
  
  retrieving the Web page from the origin server to the gateway;
  
  returning the Web page to the client;
  
  determining whether the Web page contains multi-modal components;
  
  sending the multi-modal components from the client to the speech recognition server (360) using a wireless packet streaming protocol connection;
  
  obtaining a grammar file or TTS markup strings by the speech recognition server (360) from a remotely located server using an established HTTP network connection (370) from URL references sent from the client;
  
  loading the received grammars in the speech recognizer for performing speech recognition and TTS markup strings into the speech synthesizer for producing synthesized speech; and
  
  returning speech recognition results from the speech recognizer and produced synthesized speech to the client over said wireless packet streaming protocol connection.
- View Dependent Claims (2, 3, 4, 5)
- - 2. A method according to claim 1 wherein said wireless telecommunication system operates in accordance with Wireless Application Protocol (WAP).
  - 3. A method according to claim 1 wherein the multi-modal components include grammar, TTS markup strings, pre-recorded audio, video, or music markup, or URL references of any of those mentioned.
  - 4. A method according to claim 3 wherein the grammar and TTS markup strings are embedded in the Web page.
  - 5. A method according to claim 1 wherein the wireless packet streaming protocol connection is a wireless UDP connection.

6. A wireless telecommunication system comprising a mobile client (300), a proxy gateway (320) in wireless communication with the mobile client, wherein said gateway hosts an HTTP network connection (330), and a speech recognition server (360) in wireless communication with the mobile client, the system being characterized in that a wireless packet streaming protocol connection (354) is established between the mobile client and the speech recognition server for the transfer of audio related packet data, and wherein the speech recognition server possesses an HTTP network connection for retrieving grammar and text-to-speech information from a remotely located server.
- View Dependent Claims (7, 8, 9, 10)
- - 7. A wireless telecommunication system according to claim 6 characterized in that the wireless packet streaming protocol connection (354) is a wireless UDP connection.
  - 8. A wireless telecommunication system according to claim 7 characterized in that the mobile client and speech recognition server each possesses a UDP port and associated hardware and software to facilitate communication via a wireless UDP connection.
  - 9. A wireless telecommunication system according to claim 6 characterized in that the speech recognition server further comprises a speech recognizer, a text-to-speech processor, and security hardware and software for ensuring the secure transfer of communications data.
  - 10. A wireless telecommunication system according to claim 6 characterized in that the HTTP network connection is a high speed Internet connection.

11. A mobile client device comprising:
- means for interfacing with a proxy gateway via a data protocol standard;
  
  means for retrieving a Web page located on an origin server;
  
  means for extracting multi-modal components from said Web page for transmission to a speech recognition server;
  
  means for generating speech parameters for use with said speech recognition server; and
  
  means for establishing a packet streaming protocol connection for wireless communication with said speech recognition server (SRS).
- View Dependent Claims (12, 13, 14, 15, 16)
- - 12. A mobile client device according to claim 11 wherein the data protocol standard is Wireless Application Protocol (WAP).
  - 13. A mobile client device according to claim 11 wherein multi-modal components includes any one of grammar, TrS markup strings, pre-recorded audio, video, or music markup, or URL references of any of those mentioned.
  - 14. A mobile client device according to claim 11 wherein the generated speech parameters in the client are used together with a distributed speech recognition system (DSR) comprising a remote SRS.
  - 15. A mobile client device according to claim 11 wherein the packet streaming protocol connection is a wireless UDP connection.
  - 16. A mobile client device according to claim 11 wherein the packet streaming protocol connection is used for returning synthesized speech to the client from the SRS.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nokia Technologies Oy (Nokia Corporation)
Original Assignee
Nokia Corporation
Inventors
Kapanen, Pekka, Bergman, Janne

Granted Patent

US 7,382,770 B2
Time in Patent Office

Days
Field of Search
US Class Current

370/352
CPC Class Codes

G06F 40/211 Syntactic parsing, e.g. bas...

G06F 40/279 Recognition of textual enti...

Multi-modal content and automatic speech recognition in wireless telecommunication systems

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

145 Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Multi-modal content and automatic speech recognition in wireless telecommunication systems

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

145 Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links