Improving speech capabilities of a multimodal application
First Claim
1. A method of improving speech capabilities of a multimodal application, the method implemented with a multimodal browser and a speech engine operating on a multimodal device supporting multiple modes of user interaction with the multimodal application, the modes of user interaction including a voice mode and one or more non-voice modes, wherein the voice mode includes accepting speech input from a user, digitizing the speech, and providing digitized speech to a speech engine available to the multimodal browser for recognition, and wherein the non-voice mode includes accepting input from a user through physical user interaction with a user input device for the multimodal device;
- wherein the multimodal browser comprises a module of automated computing machinery for executing the multimodal application and the multimodal browser supports execution of a media file player, a module of automated computing machinery for playing media files;
the method comprising;
receiving, by the multimodal browser, a media file having a metadata container;
retrieving, by the multimodal browser, from the metadata container a speech artifact related to content stored in the media file for inclusion in the speech engine available to the multimodal browser, wherein said retrieving, by the multimodal browser, from the metadata container the speech artifact for inclusion in a speech engine available to the multimodal browser comprises retrieving an XML document from the metadata container;
determining whether the speech artifact includes a grammar rule or a pronunciation rule;
if the speech artifact includes a grammar rule, modifying, by the multimodal browser, the grammar of the speech engine to include the grammar rule, wherein said modifying, by the multimodal browser, the grammar of the speech engine to include the grammar rule includes extracting from the XML document retrieved from the metadata container a grammar rule and including the grammar rule in an XML grammar document in the speech engine; and
if the speech artifact includes a pronunciation rule, modifying, by the multimodal browser, the lexicon of the speech engine to include the pronunciation rule, wherein said modifying, by the multimodal browser, the lexicon of the speech engine to include the pronunciation rule includes extracting from the XML document retrieved from the metadata container a pronunciation rule and including the pronunciation rule in an XML lexicon document in the speech engine.
2 Assignments
0 Petitions
Accused Products
Abstract
Improving speech capabilities of a multimodal application including receiving, by the multimodal browser, a media file having a metadata container; retrieving, by the multimodal browser, from the metadata container a speech artifact related to content stored in the media file for inclusion in the speech engine available to the multimodal browser; determining whether the speech artifact includes a grammar rule or a pronunciation rule; if the speech artifact includes a grammar rule, modifying, by the multimodal browser, the grammar of the speech engine to include the grammar rule; and if the speech artifact includes a pronunciation rule, modifying, by the multimodal browser, the lexicon of the speech engine to include the pronunciation rule.
87 Citations
9 Claims
-
1. A method of improving speech capabilities of a multimodal application, the method implemented with a multimodal browser and a speech engine operating on a multimodal device supporting multiple modes of user interaction with the multimodal application, the modes of user interaction including a voice mode and one or more non-voice modes, wherein the voice mode includes accepting speech input from a user, digitizing the speech, and providing digitized speech to a speech engine available to the multimodal browser for recognition, and wherein the non-voice mode includes accepting input from a user through physical user interaction with a user input device for the multimodal device;
- wherein the multimodal browser comprises a module of automated computing machinery for executing the multimodal application and the multimodal browser supports execution of a media file player, a module of automated computing machinery for playing media files;
the method comprising; receiving, by the multimodal browser, a media file having a metadata container; retrieving, by the multimodal browser, from the metadata container a speech artifact related to content stored in the media file for inclusion in the speech engine available to the multimodal browser, wherein said retrieving, by the multimodal browser, from the metadata container the speech artifact for inclusion in a speech engine available to the multimodal browser comprises retrieving an XML document from the metadata container; determining whether the speech artifact includes a grammar rule or a pronunciation rule; if the speech artifact includes a grammar rule, modifying, by the multimodal browser, the grammar of the speech engine to include the grammar rule, wherein said modifying, by the multimodal browser, the grammar of the speech engine to include the grammar rule includes extracting from the XML document retrieved from the metadata container a grammar rule and including the grammar rule in an XML grammar document in the speech engine; and if the speech artifact includes a pronunciation rule, modifying, by the multimodal browser, the lexicon of the speech engine to include the pronunciation rule, wherein said modifying, by the multimodal browser, the lexicon of the speech engine to include the pronunciation rule includes extracting from the XML document retrieved from the metadata container a pronunciation rule and including the pronunciation rule in an XML lexicon document in the speech engine. - View Dependent Claims (2, 3)
- wherein the multimodal browser comprises a module of automated computing machinery for executing the multimodal application and the multimodal browser supports execution of a media file player, a module of automated computing machinery for playing media files;
-
4. An apparatus for improving speech capabilities of a multimodal application, the apparatus including a multimodal browser and a multimodal application operating on a multimodal device supporting multiple modes of user interaction with the multimodal application, the modes of user interaction including a voice mode and one or more non-voice modes, the apparatus comprising a computer processor and a computer memory operatively coupled to the computer processor, the computer memory having disposed within it computer program instructions for:
-
receiving, by the multimodal browser, a media file having a metadata container; retrieving, by the multimodal browser, from the metadata container a speech artifact related to content stored in the media file for inclusion in the speech engine available to the multimodal browser, wherein the computer program instructions for retrieving, by the multimodal browser, from the metadata container the speech artifact for inclusion in the speech engine available to the multimodal browser comprises computer program instructions for retrieving an XML document from the metadata container; determining whether the speech artifact includes a grammar rule or a pronunciation rule; if the speech artifact includes a grammar rule, modifying, by the multimodal browser, the grammar of the speech engine to include the grammar rule, wherein the computer program instructions for modifying, by the multimodal browser, the grammar of the speech engine to include the grammar rule includes computer program instructions for extracting from the XML document retrieved from the metadata container a grammar rule and including the grammar rule in an XML grammar document in the speech engine; and if the speech artifact includes a pronunciation rule, modifying, by the multimodal browser, the lexicon of the speech engine to include the pronunciation rule, wherein the computer program instructions for modifying, by the multimodal browser, the lexicon of the speech engine to include the pronunciation rule include computer program instructions for extracting from the XML document retrieved from the metadata container a pronunciation rule and including the pronunciation rule in an XML lexicon document in the speech engine. - View Dependent Claims (5, 6)
-
-
7. A computer program product for improving speech capabilities of a multimodal application, the computer program product including a multimodal browser for operating on a multimodal device supporting multiple modes of user interaction with the multimodal application, the modes of user interaction including a voice mode and one or more non-voice modes, the computer program product disposed upon a computer-readable, recording medium, the computer program product comprising computer program instructions capable for:
-
receiving, by the multimodal browser, a media file having a metadata container; retrieving, by the multimodal browser, from the metadata container a speech artifact related to content stored in the media file for inclusion in the speech engine available to the multimodal browser, wherein the computer program instructions for retrieving, by the multimodal browser, from the metadata container the speech artifact for inclusion in the speech engine available to the multimodal browser comprises computer program instructions for retrieving an XML document from the metadata container; determining whether the speech artifact includes a grammar rule or a pronunciation rule; if the speech artifact includes a grammar rule, modifying, by the multimodal browser, the grammar of the speech engine to include the grammar rule, wherein the computer program instructions for modifying, by the multimodal browser, the grammar of the speech engine to include the grammar rule includes computer program instructions for extracting from the XML document retrieved from the metadata container a grammar rule and including the grammar rule in an XML grammar document in the speech engine; and if the speech artifact includes a pronunciation rule, modifying, by the multimodal browser, the lexicon of the speech engine to include the pronunciation rule, wherein the computer program instructions for modifying, by the multimodal browser, the lexicon of the speech engine to include the pronunciation rule include computer program instructions for extracting from the XML document retrieved from the metadata container a pronunciation rule and including the pronunciation rule in an XML lexicon document in the speech engine. - View Dependent Claims (8, 9)
-
Specification