Improving speech capabilities of a multimodal application

US 8,380,513 B2
Filed: 05/19/2009
Issued: 02/19/2013
Est. Priority Date: 05/19/2009
Status: Expired due to Fees

First Claim

Patent Images

1. A method of improving speech capabilities of a multimodal application, the method implemented with a multimodal browser and a speech engine operating on a multimodal device supporting multiple modes of user interaction with the multimodal application, the modes of user interaction including a voice mode and one or more non-voice modes, wherein the voice mode includes accepting speech input from a user, digitizing the speech, and providing digitized speech to a speech engine available to the multimodal browser for recognition, and wherein the non-voice mode includes accepting input from a user through physical user interaction with a user input device for the multimodal device;

wherein the multimodal browser comprises a module of automated computing machinery for executing the multimodal application and the multimodal browser supports execution of a media file player, a module of automated computing machinery for playing media files;

the method comprising;

receiving, by the multimodal browser, a media file having a metadata container;

retrieving, by the multimodal browser, from the metadata container a speech artifact related to content stored in the media file for inclusion in the speech engine available to the multimodal browser, wherein said retrieving, by the multimodal browser, from the metadata container the speech artifact for inclusion in a speech engine available to the multimodal browser comprises retrieving an XML document from the metadata container;

determining whether the speech artifact includes a grammar rule or a pronunciation rule;

if the speech artifact includes a grammar rule, modifying, by the multimodal browser, the grammar of the speech engine to include the grammar rule, wherein said modifying, by the multimodal browser, the grammar of the speech engine to include the grammar rule includes extracting from the XML document retrieved from the metadata container a grammar rule and including the grammar rule in an XML grammar document in the speech engine; and

if the speech artifact includes a pronunciation rule, modifying, by the multimodal browser, the lexicon of the speech engine to include the pronunciation rule, wherein said modifying, by the multimodal browser, the lexicon of the speech engine to include the pronunciation rule includes extracting from the XML document retrieved from the metadata container a pronunciation rule and including the pronunciation rule in an XML lexicon document in the speech engine.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Improving speech capabilities of a multimodal application including receiving, by the multimodal browser, a media file having a metadata container; retrieving, by the multimodal browser, from the metadata container a speech artifact related to content stored in the media file for inclusion in the speech engine available to the multimodal browser; determining whether the speech artifact includes a grammar rule or a pronunciation rule; if the speech artifact includes a grammar rule, modifying, by the multimodal browser, the grammar of the speech engine to include the grammar rule; and if the speech artifact includes a pronunciation rule, modifying, by the multimodal browser, the lexicon of the speech engine to include the pronunciation rule.

87 Citations

9 Claims

1. A method of improving speech capabilities of a multimodal application, the method implemented with a multimodal browser and a speech engine operating on a multimodal device supporting multiple modes of user interaction with the multimodal application, the modes of user interaction including a voice mode and one or more non-voice modes, wherein the voice mode includes accepting speech input from a user, digitizing the speech, and providing digitized speech to a speech engine available to the multimodal browser for recognition, and wherein the non-voice mode includes accepting input from a user through physical user interaction with a user input device for the multimodal device;
- wherein the multimodal browser comprises a module of automated computing machinery for executing the multimodal application and the multimodal browser supports execution of a media file player, a module of automated computing machinery for playing media files;
  
  the method comprising;
  
  receiving, by the multimodal browser, a media file having a metadata container;
  
  retrieving, by the multimodal browser, from the metadata container a speech artifact related to content stored in the media file for inclusion in the speech engine available to the multimodal browser, wherein said retrieving, by the multimodal browser, from the metadata container the speech artifact for inclusion in a speech engine available to the multimodal browser comprises retrieving an XML document from the metadata container;
  
  determining whether the speech artifact includes a grammar rule or a pronunciation rule;
  
  if the speech artifact includes a grammar rule, modifying, by the multimodal browser, the grammar of the speech engine to include the grammar rule, wherein said modifying, by the multimodal browser, the grammar of the speech engine to include the grammar rule includes extracting from the XML document retrieved from the metadata container a grammar rule and including the grammar rule in an XML grammar document in the speech engine; and
  
  if the speech artifact includes a pronunciation rule, modifying, by the multimodal browser, the lexicon of the speech engine to include the pronunciation rule, wherein said modifying, by the multimodal browser, the lexicon of the speech engine to include the pronunciation rule includes extracting from the XML document retrieved from the metadata container a pronunciation rule and including the pronunciation rule in an XML lexicon document in the speech engine.
- View Dependent Claims (2, 3)
- - 2. The method of claim 1 wherein retrieving, by the multimodal browser, from the metadata container a speech artifact for inclusion in a speech engine available to the multimodal browser further comprises scanning the metadata container for a tag identifying the speech artifact.
  - 3. The method of claim 2 wherein scanning the metadata container for a tag identifying the speech artifacts further comprises scanning an ID3 container of an MPEG media file for a frame identifying speech artifacts.

4. An apparatus for improving speech capabilities of a multimodal application, the apparatus including a multimodal browser and a multimodal application operating on a multimodal device supporting multiple modes of user interaction with the multimodal application, the modes of user interaction including a voice mode and one or more non-voice modes, the apparatus comprising a computer processor and a computer memory operatively coupled to the computer processor, the computer memory having disposed within it computer program instructions for:
- receiving, by the multimodal browser, a media file having a metadata container;
  
  retrieving, by the multimodal browser, from the metadata container a speech artifact related to content stored in the media file for inclusion in the speech engine available to the multimodal browser, wherein the computer program instructions for retrieving, by the multimodal browser, from the metadata container the speech artifact for inclusion in the speech engine available to the multimodal browser comprises computer program instructions for retrieving an XML document from the metadata container;
  
  determining whether the speech artifact includes a grammar rule or a pronunciation rule;
  
  if the speech artifact includes a grammar rule, modifying, by the multimodal browser, the grammar of the speech engine to include the grammar rule, wherein the computer program instructions for modifying, by the multimodal browser, the grammar of the speech engine to include the grammar rule includes computer program instructions for extracting from the XML document retrieved from the metadata container a grammar rule and including the grammar rule in an XML grammar document in the speech engine; and
  
  if the speech artifact includes a pronunciation rule, modifying, by the multimodal browser, the lexicon of the speech engine to include the pronunciation rule, wherein the computer program instructions for modifying, by the multimodal browser, the lexicon of the speech engine to include the pronunciation rule include computer program instructions for extracting from the XML document retrieved from the metadata container a pronunciation rule and including the pronunciation rule in an XML lexicon document in the speech engine.
- View Dependent Claims (5, 6)
- - 5. The apparatus of claim 4 wherein computer program instructions for retrieving, by the multimodal browser, from the metadata container a speech artifact for inclusion in a speech engine available to the multimodal browser further comprise computer program instructions for scanning the metadata container for a tag identifying the speech artifact.
  - 6. The apparatus of claim 5 wherein computer program instructions for scanning the metadata container for a tag identifying the speech artifacts further comprise computer program instructions for scanning an ID3 container of an MPEG media file for a frame identifying speech artifacts.

7. A computer program product for improving speech capabilities of a multimodal application, the computer program product including a multimodal browser for operating on a multimodal device supporting multiple modes of user interaction with the multimodal application, the modes of user interaction including a voice mode and one or more non-voice modes, the computer program product disposed upon a computer-readable, recording medium, the computer program product comprising computer program instructions capable for:
- receiving, by the multimodal browser, a media file having a metadata container;
  
  retrieving, by the multimodal browser, from the metadata container a speech artifact related to content stored in the media file for inclusion in the speech engine available to the multimodal browser, wherein the computer program instructions for retrieving, by the multimodal browser, from the metadata container the speech artifact for inclusion in the speech engine available to the multimodal browser comprises computer program instructions for retrieving an XML document from the metadata container;
  
  determining whether the speech artifact includes a grammar rule or a pronunciation rule;
  
  if the speech artifact includes a grammar rule, modifying, by the multimodal browser, the grammar of the speech engine to include the grammar rule, wherein the computer program instructions for modifying, by the multimodal browser, the grammar of the speech engine to include the grammar rule includes computer program instructions for extracting from the XML document retrieved from the metadata container a grammar rule and including the grammar rule in an XML grammar document in the speech engine; and
  
  if the speech artifact includes a pronunciation rule, modifying, by the multimodal browser, the lexicon of the speech engine to include the pronunciation rule, wherein the computer program instructions for modifying, by the multimodal browser, the lexicon of the speech engine to include the pronunciation rule include computer program instructions for extracting from the XML document retrieved from the metadata container a pronunciation rule and including the pronunciation rule in an XML lexicon document in the speech engine.
- View Dependent Claims (8, 9)
- - 8. The computer program product of claim 7 wherein computer program instructions for retrieving, by the multimodal browser, from the metadata container a speech artifact for inclusion in a speech engine available to the multimodal browser further comprise computer program instructions for scanning the metadata container for a tag identifying the speech artifact.
  - 9. The computer program product of claim 8 wherein computer program instructions for scanning the metadata container for a tag identifying the speech artifacts further comprise computer program instructions for scanning an ID3 container of an MPEG media file for a frame identifying speech artifacts.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Agapi, Ciprian, Bodin, William K., Cross, Charles W. Jr.
Primary Examiner(s)
COLUCCI, MICHAEL C

Application Number

US12/468,166
Publication Number

US 20100299146A1
Time in Patent Office

1,372 Days
Field of Search

701/36, 704/270.1, 704/9, 704/278, 709/228, 709/206, 719/310, 715/854, 715/764, 715/763, 715/745, 715/205, 715/201, 715/200
US Class Current

704/270
CPC Class Codes

G10L 15/187   Phonemic context, e.g. pron...

G10L 15/19   Grammatical context, e.g. d...

G10L 15/22   Procedures used during a sp...

G10L 2015/228   of application context

Improving speech capabilities of a multimodal application

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

87 Citations

9 Claims

Specification

Solutions

Use Cases

Quick Links

Improving speech capabilities of a multimodal application

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

87 Citations

9 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links