Speech Capabilities Of A Multimodal Application

US 20100299146A1
Filed: 05/19/2009
Published: 11/25/2010
Est. Priority Date: 05/19/2009
Status: Active Grant

First Claim

Patent Images

1. A method of improving speech capabilities of a multimodal application, the method implemented with a multimodal browser and a speech engine operating on a multimodal device supporting multiple modes of user interaction with the multimodal application, the modes of user interaction including a voice mode and one or more non-voice modes, wherein the voice mode includes accepting speech input from a user, digitizing the speech, and providing digitized speech to a speech engine available to the multimodal browser for recognition, and wherein the non-voice mode includes accepting input from a user through physical user interaction with a user input device for the multimodal device;

wherein the multimodal browser comprises a module of automated computing machinery for executing the multimodal application and the multimodal browser supports execution of a media file player, a module of automated computing machinery for playing media files;

the method comprising;

receiving, by the multimodal browser, a media file having a metadata container;

retrieving, by the multimodal browser, from the metadata container a speech artifact related to content stored in the media file for inclusion in the speech engine available to the multimodal browser;

determining whether the speech artifact includes a grammar rule or a pronunciation rule;

if the speech artifact includes a grammar rule, modifying, by the multimodal browser, the grammar of the speech engine to include the grammar rule; and

if the speech artifact includes a pronunciation rule, modifying, by the multimodal browser, the lexicon of the speech engine to include the pronunciation rule.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Improving speech capabilities of a multimodal application including receiving, by the multimodal browser, a media file having a metadata container; retrieving, by the multimodal browser, from the metadata container a speech artifact related to content stored in the media file for inclusion in the speech engine available to the multimodal browser; determining whether the speech artifact includes a grammar rule or a pronunciation rule; if the speech artifact includes a grammar rule, modifying, by the multimodal browser, the grammar of the speech engine to include the grammar rule; and if the speech artifact includes a pronunciation rule, modifying, by the multimodal browser, the lexicon of the speech engine to include the pronunciation rule.

88 Citations

View as Search Results

15 Claims

1. A method of improving speech capabilities of a multimodal application, the method implemented with a multimodal browser and a speech engine operating on a multimodal device supporting multiple modes of user interaction with the multimodal application, the modes of user interaction including a voice mode and one or more non-voice modes, wherein the voice mode includes accepting speech input from a user, digitizing the speech, and providing digitized speech to a speech engine available to the multimodal browser for recognition, and wherein the non-voice mode includes accepting input from a user through physical user interaction with a user input device for the multimodal device;
- wherein the multimodal browser comprises a module of automated computing machinery for executing the multimodal application and the multimodal browser supports execution of a media file player, a module of automated computing machinery for playing media files;
  
  the method comprising;
  
  receiving, by the multimodal browser, a media file having a metadata container;
  
  retrieving, by the multimodal browser, from the metadata container a speech artifact related to content stored in the media file for inclusion in the speech engine available to the multimodal browser;
  
  determining whether the speech artifact includes a grammar rule or a pronunciation rule;
  
  if the speech artifact includes a grammar rule, modifying, by the multimodal browser, the grammar of the speech engine to include the grammar rule; and
  
  if the speech artifact includes a pronunciation rule, modifying, by the multimodal browser, the lexicon of the speech engine to include the pronunciation rule.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1 wherein retrieving, by the multimodal browser, from the metadata container a speech artifact for inclusion in a speech engine available to the multimodal browser further comprises scanning the metadata container for a tag identifying the speech artifact.
  - 3. The method of claim 2 wherein scanning the metadata container for a tag identifying the speech artifacts further comprises scanning an ID3 container of an MPEG media file for a frame identifying speech artifacts.
  - 4. The method of claim 1 whereinretrieving, by the multimodal browser, from the metadata container a speech artifact for inclusion in a speech engine available to the multimodal browser further comprises retrieving an XML document from the metadata container;
    - andmodifying, by the multimodal browser, the grammar of the speech engine to include the grammar rule includes extracting from the XML document retrieved from the metadata container a grammar rule and including the grammar rule in an XML grammar document in the speech engine.
  - 5. The method of claim 1 whereinretrieving, by the multimodal browser, from the metadata container a speech artifact for inclusion in a speech engine available to the multimodal browser further comprises retrieving an XML document from the metadata container;
    - andmodifying, by the multimodal browser, the lexicon of the speech engine to include the pronunciation rule includes extracting from the XML document retrieved from the metadata container a pronunciation rule and including the pronunciation rule in an XML lexicon document in the speech engine.

6. An apparatus for improving speech capabilities of a multimodal application, the apparatus including a multimodal browser and a multimodal application operating on a multimodal device supporting multiple modes of user interaction with the multimodal application, the modes of user interaction including a voice mode and one or more non-voice modes, the apparatus comprising a computer processor and a computer memory operatively coupled to the computer processor, the computer memory having disposed within it computer program instructions for:
- receiving, by the multimodal browser, a media file having a metadata container;
  
  retrieving, by the multimodal browser, from the metadata container a speech artifact related to content stored in the media file for inclusion in the speech engine available to the multimodal browser;
  
  determining whether the speech artifact includes a grammar rule or a pronunciation rule;
  
  if the speech artifact includes a grammar rule, modifying, by the multimodal browser, the grammar of the speech engine to include the grammar rule; and
  
  if the speech artifact includes a pronunciation rule, modifying, by the multimodal browser, the lexicon of the speech engine to include the pronunciation rule.
- View Dependent Claims (7, 8, 9, 10)
- - 7. The apparatus of claim 6 wherein computer program instructions for retrieving, by the multimodal browser, from the metadata container a speech artifact for inclusion in a speech engine available to the multimodal browser further comprise computer program instructions for scanning the metadata container for a tag identifying the speech artifact.
  - 8. The apparatus of claim 7 wherein computer program instructions for scanning the metadata container for a tag identifying the speech artifacts further comprise computer program instructions for scanning an ID3 container of an MPEG media file for a frame identifying speech artifacts.
  - 9. The apparatus of claim 6 whereincomputer program instructions for retrieving, by the multimodal browser, from the metadata container a speech artifact for inclusion in a speech engine available to the multimodal browser further comprise computer program instructions for retrieving an XML document from the metadata container;
    - andcomputer program instructions for modifying, by the multimodal browser, the grammar of the speech engine to include the grammar rule includes computer program instructions for extracting from the XML document retrieved from the metadata container a grammar rule and including the grammar rule in an XML grammar document in the speech engine.
  - 10. The apparatus of claim 6 whereincomputer program instructions for retrieving, by the multimodal browser, from the metadata container a speech artifact for inclusion in a speech engine available to the multimodal browser further comprise computer program instructions for retrieving an XML document from the metadata container;
    - andcomputer program instructions for modifying, by the multimodal browser, the lexicon of the speech engine to include the pronunciation rule include computer program instructions for extracting from the XML document retrieved from the metadata container a pronunciation rule and including the pronunciation rule in an XML lexicon document in the speech engine.

11. An computer program product for improving speech capabilities of a multimodal application, the computer program product including a multimodal browser for operating on a multimodal device supporting multiple modes of user interaction with the multimodal application, the modes of user interaction including a voice mode and one or more non-voice modes, the computer program product disposed upon a computer-readable, recording medium, the computer program product comprising computer program instructions capable for:
- receiving, by the multimodal browser, a media file having a metadata container;
  
  retrieving, by the multimodal browser, from the metadata container a speech artifact related to content stored in the media file for inclusion in the speech engine available to the multimodal browser;
  
  determining whether the speech artifact includes a grammar rule or a pronunciation rule;
  
  if the speech artifact includes a grammar rule, modifying, by the multimodal browser, the grammar of the speech engine to include the grammar rule; and
  
  if the speech artifact includes a pronunciation rule, modifying, by the multimodal browser, the lexicon of the speech engine to include the pronunciation rule.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The computer program product of claim 11 wherein computer program instructions for retrieving, by the multimodal browser, from the metadata container a speech artifact for inclusion in a speech engine available to the multimodal browser further comprise computer program instructions for scanning the metadata container for a tag identifying the speech artifact.
  - 13. The computer program product of claim 12 wherein computer program instructions for scanning the metadata container for a tag identifying the speech artifacts further comprise computer program instructions for scanning an ID3 container of an MPEG media file for a frame identifying speech artifacts.
  - 14. The computer program product of claim 11 whereincomputer program instructions for retrieving, by the multimodal browser, from the metadata container a speech artifact for inclusion in a speech engine available to the multimodal browser further comprise computer program instructions for retrieving an XML document from the metadata container;
    - andcomputer program instructions for modifying, by the multimodal browser, the grammar of the speech engine to include the grammar rule includes computer program instructions for extracting from the XML document retrieved from the metadata container a grammar rule and including the grammar rule in an XML grammar document in the speech engine.
  - 15. The computer program product of claim 11 whereincomputer program instructions for retrieving, by the multimodal browser, from the metadata container a speech artifact for inclusion in a speech engine available to the multimodal browser further comprise computer program instructions for retrieving an XML document from the metadata container;
    - andcomputer program instructions for modifying, by the multimodal browser, the lexicon of the speech engine to include the pronunciation rule includecomputer program instructions for extracting from the XML document retrieved from the metadata container a pronunciation rule and including the pronunciation rule in an XML lexicon document in the speech engine.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Cross, Charles W. JR., Bodin, William K., Agapi, Ciprian

Granted Patent

US 8,380,513 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/235
CPC Class Codes

G10L 15/187   Phonemic context, e.g. pron...

G10L 15/19   Grammatical context, e.g. d...

G10L 15/22   Procedures used during a sp...

G10L 2015/228   of application context

Speech Capabilities Of A Multimodal Application

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

88 Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Speech Capabilities Of A Multimodal Application

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

88 Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links