Multi-modal content and automatic speech recognition in wireless telecommunication systems

US 7,382,770 B2
Filed: 02/27/2003
Issued: 06/03/2008
Est. Priority Date: 08/30/2000
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

sending a request for a Web page from a mobile client to a gateway, wherein the mobile client is in wireless communication with the gateway;

retrieving the Web page from an origin server to the gateway;

returning the Web page to the client;

determining whether the Web page contains multi-modal components;

sending a multi-modal components from the client to a speech recognition server using a direct wireless packet streaming protocol connection, wherein the speech recognition server includes a speech recognizer and a text-to-speech synthesizer;

obtaining a grammar file or text-to-speech markup strings by the speech recognition server from a remotely located server using an established hypertext transport protocol network connection from at least one universal resource locator references sent from the client;

loading the received grammars in a speech recognizer for performing speech recognition and text-to-speech markup strings into the speech synthesizer for producing synthesized speech; and

returning speech recognition results from the speech recognizer and produced synthesized speech to the client over said wireless packet streaming protocol connection.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A communication architecture for delivery of grammar and speech related information such as text-to-speech (TTS) data to a speech recognition server operating with a wireless telecommunication system for use with automatic speech recognition and interactive voice-based applications. In the invention, a mobile client retrieves a Web page containing multi-modal content hosted on a origin server via WAP gateway. The content may include a grammar file and/or TTS strings embedded in the content or reference URL(s) pointing to their storage locations. The client then sends the grammar and/or TTS strings to a speech recognition server via a wireless packet streaming protocol channel. When URL(s) are received by the client and sent to the SRS, the grammar file and/or TTS strings are obtained via a high speed HTTP connection. The speech processing results and the synthesized speech are returned to the client over the established wireless UDP connection.

Citations

15 Claims

1. A method comprising:
- sending a request for a Web page from a mobile client to a gateway, wherein the mobile client is in wireless communication with the gateway;
  
  retrieving the Web page from an origin server to the gateway;
  
  returning the Web page to the client;
  
  determining whether the Web page contains multi-modal components;
  
  sending a multi-modal components from the client to a speech recognition server using a direct wireless packet streaming protocol connection, wherein the speech recognition server includes a speech recognizer and a text-to-speech synthesizer;
  
  obtaining a grammar file or text-to-speech markup strings by the speech recognition server from a remotely located server using an established hypertext transport protocol network connection from at least one universal resource locator references sent from the client;
  
  loading the received grammars in a speech recognizer for performing speech recognition and text-to-speech markup strings into the speech synthesizer for producing synthesized speech; and
  
  returning speech recognition results from the speech recognizer and produced synthesized speech to the client over said wireless packet streaming protocol connection.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method according to claim 1, wherein said wireless telecommunication system operates in accordance with Wireless Application Protocol.
  - 3. The method according to claim 1, wherein the multi-modal components include grammar, text-to-speech markup strings, pre-recorded audio, video, or music markup, or universal resource locator references of any of those mentioned.
  - 4. The method according to claim 3, wherein the grammar and text-to-speech markup strings are embedded in the Web page.
  - 5. The method according to claim 1, wherein the wireless packet streaming protocol connection is a wireless user datagram protocol connection.

6. An apparatus comprising:
- a processor configured to control operations of the apparatus; and
  
  memory storing executable instructions that, when executed by the processor, cause the apparatus to perform;
  
  interfacing with a proxy gateway via a data protocol standard to retrieve a Web page located on an origin server;
  
  extracting multi-modal components from said Web page for transmission to a speech recognition server;
  
  generating speech parameters for use with said speech recognition sewer;
  
  establishing a direct wireless packet streaming protocol connection for wireless communication with said speech recognition server;
  
  sending the multi-modal components to the speech recognition server using the established direct wireless packet streaming protocol connection; and
  
  receiving speech recognition results and produced synthesized speech from the speech recognition server via said wireless packet streaming protocol connection.
- View Dependent Claims (7, 8, 9, 10)
- - 7. The apparatus according to claim 6, wherein the data protocol standard is Wireless Application Protocol.
  - 8. The apparatus according to claim 6, wherein said multi-modal components includes any one of grammar, text-to-speech markup strings, pre-recorded audio, video, or music markup, or URL references of any of those mentioned.
  - 9. The apparatus according to claim 6, wherein the generated speech parameters in the client are used together with a distributed speech recognition system comprising a remote speech recognition server.
  - 10. The apparatus according to claim 6, wherein the packet streaming protocol connection is a wireless user datagram protocol connection.

11. An apparatus, comprising:
- a processor configured to control operations of the apparatus; and
  
  memory storing executable instructions that, when executed by the processor, cause the apparatus to perform;
  
  establishing a direct wireless packet streaming protocol connection with a mobile client;
  
  receiving multi-modal components from the client via the established direct wireless packet streaming protocol connection;
  
  obtaining a grammar file or text-to-speech markup strings from a remotely located server using an established hypertext transport protocol network connection from at least one universal resource locator reference sent from the mobile client;
  
  loading the received grammars in a speech recognizer for performing speech recognition and text-to-speech markup strings into the speech synthesizer for producing synthesized speech; and
  
  returning speech recognition results from the speech recognizer and produced synthesized speech to the mobile client over the said wireless packet streaming protocol connection.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The apparatus according to claim 11, wherein the wireless packet streaming protocol connection is a wireless user datagram protocol connection.
  - 13. The apparatus according to claim 12, wherein the mobile client and apparatus each possesses a user datagram protocol port and associated hardware and software to facilitate communication via a wireless user datagram protocol connection.
  - 14. The apparatus according to claim 11, wherein the speech recognition server further comprises:
    - a speech recognizer, a text-to-speech processor, and security hardware and software for ensuring the secure transfer of communications data.
  - 15. The apparatus according to claim 11, wherein the hypertext transport protocol network connection is a high speed Internet connection.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nokia Technologies Oy (Nokia Corporation)
Original Assignee
Nokia Corporation
Inventors
Kapanen, Pekka, Bergman, Janne
Primary Examiner(s)
Pham; Chi
Assistant Examiner(s)
Boakye; Alexander O.

Application Number

US10/374,262
Publication Number

US 20030161298A1
Time in Patent Office

1,923 Days
Field of Search

370/352, 370/329, 370/401, 370/338, 370/349, 370/357, 370/389, 370/400, 709/219, 709/203, 709/229, 719/328, 455/563, 379/88.16, 704/270.1, 704/251
US Class Current

370/352
CPC Class Codes

G06F 40/211 Syntactic parsing, e.g. bas...

G06F 40/279 Recognition of textual enti...

Multi-modal content and automatic speech recognition in wireless telecommunication systems

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Multi-modal content and automatic speech recognition in wireless telecommunication systems

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links