Dynamically extending the speech prompts of a multimodal application

US 9,530,411 B2
Filed: 08/26/2013
Issued: 12/27/2016
Est. Priority Date: 06/24/2009
Status: Active Grant

First Claim

Patent Images

1. A method of dynamically extending the speech prompts ofa multimodal application, the method comprising:

receiving, by a prompt generation engine, a media file having a metadata container, wherein the prompt generation engine operates on one or more voice servers;

retrieving, by the prompt generation engine from the metadata container, a speech prompt related to content stored in the media file for inclusion in the multimodal application, wherein the speech prompt is an audio phrase played by the multimodal application, wherein retrieving a speech prompt includes retrieving a speech artifact having a grammar rule or a pronunciation rule and wherein retrieving a speech prompt includes retrieving a speech artifact having an XML document; and

modifying, by the prompt generation engine, the multimodal application to include the speech prompt.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A prompt generation engine operates to dynamically extend prompts of a multimodal application. The prompt generation engine receives a media file having a metadata container. The prompt generation engine operates on a multimodal device that supports a voice mode and a non-voice mode for interacting with the multimodal device. The prompt generation engine retrieves from the metadata container a speech prompt related to content stored in the media file for inclusion in the multimodal application. The prompt generation engine modifies the multimodal application to include the speech prompt.

92 Citations

14 Claims

1. A method of dynamically extending the speech prompts ofa multimodal application, the method comprising:
- receiving, by a prompt generation engine, a media file having a metadata container, wherein the prompt generation engine operates on one or more voice servers;
  
  retrieving, by the prompt generation engine from the metadata container, a speech prompt related to content stored in the media file for inclusion in the multimodal application, wherein the speech prompt is an audio phrase played by the multimodal application, wherein retrieving a speech prompt includes retrieving a speech artifact having a grammar rule or a pronunciation rule and wherein retrieving a speech prompt includes retrieving a speech artifact having an XML document; and
  
  modifying, by the prompt generation engine, the multimodal application to include the speech prompt.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1 wherein retrieving, by the prompt generation engine, from the metadata container a speech prompt related to content stored in the media file for inclusion in the multimodal application further comprises retrieving a text string prompt for execution by a text to speech engine.
  - 3. The method of claim 1 wherein retrieving, by the prompt generation engine, from the metadata container a speech prompt related to content stored in the media file for inclusion in the multimodal application further comprises retrieving an audio prompt to be played by a multimodal device.
  - 4. The method of claim 1 wherein retrieving, by the prompt generation engine, from the metadata container a speech prompt related to content stored in the media file for inclusion in the multimodal application further comprises identifying a tag for prompts in the metadata container.
  - 5. The method of claim 4 wherein identifying a tag for prompts in the metadata container further comprises identifying a frame for prompts in an ID3 container of an MPEG media file.
  - 6. The method of claim 1 wherein modifying, by the prompt generation engine, the multimodal application to include the speech prompt further comprises updating a prompt document with the retrieved speech prompt.
  - 7. The method of claim 1, further comprising:
    - modifying the grammar of the speech engine located on the voice server to include at least one of the grammar rule and the pronunciation rule.

8. A voice server that supports multiple modes for interacting with a multimodal device, the voice server comprising:
- a computer processor;
  
  a computer memory operatively coupled to the computer processor, the computer memory having disposed within it computer program instructions configured to;
  
  receive a media file having a metadata container;
  
  retrieve, from the metadata container, a speech prompt related to content stored in the media file for inclusion in a multimodal application, wherein the speech prompt is an audio phrase played by the multimodal application;
  
  modify the grammar of the speech engine to include at least one of the grammar rule and the pronunciation rule;
  
  retrieve a speech artifact having an XML, document; and
  
  modify the multimodal application to include the speech prompt.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The voice server of claim 8 wherein computer program instructions configured to retrieve, from the metadata container, a speech prompt related to content stored in the media file for inclusion in the multimodal application further comprise computer program instructions configured to retrieve a text string prompt for execution by a text to speech engine.
  - 10. The voice server of claim 8 wherein computer program instructions configured to retrieve, from the metadata container, a speech prompt related to content stored in the media file for inclusion in the multimodal application further comprise computer program instructions configured to retrieve an audio prompt to be played by a multimodal device.
  - 11. The voice server of claim 8 wherein computer program instructions configured to retrieve, from the metadata container, a speech prompt related to content stored in the media file for inclusion in the multimodal application further comprise computer program instructions configured to identify a tag for prompts in the metadata container.
  - 12. The voice server of claim 11 wherein computer program instructions configured to identify a tag for prompts in the metadata container further comprise computer program instructions configured to identify a frame for prompts in an ID3 container of an MPEG media file.
  - 13. The voice server of claim 8 wherein computer program instructions configured to modify the multimodal application to include the speech prompt further comprise computer program instructions configured to update a prompt document with the retrieved speech prompt.
  - 14. The voice server of claim 8 wherein computer program instructions configured to retrieve a speech prompt related to content stored in the media file for inclusion in the multimodal application.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Agapi, Ciprian, Bodin, William K., Cross, Charles W. Jr.
Primary Examiner(s)
ABEBE, DANIEL DEMELASH

Application Number

US14/010,265
Publication Number

US 20130339033A1
Time in Patent Office

1,219 Days
Field of Search

704/275
US Class Current

1/1
CPC Class Codes

G10L 13/00   Speech synthesis; Text to s...

G10L 15/22   Procedures used during a sp...

H04M 2201/40   using speech recognition sp...

H04M 3/42204   Arrangements at the exchang...

Dynamically extending the speech prompts of a multimodal application

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

92 Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Dynamically extending the speech prompts of a multimodal application

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

92 Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links