SYSTEM AND METHOD FOR PROVIDING CONTEXT-BASED DYNAMIC SPEECH GRAMMAR GENERATION FOR USE IN SEARCH APPLICATIONS
First Claim
1. A method of generating a dynamic contextual speech recognition grammar, comprising:
- for each of a plurality of groups of at least one frame of audio content, generating grammars and context data including;
providing the at least one frame of audio content to an automatic speech recognizer (ASR) for performing recognition of words that do not occur in common vocabulary and may be specific to the at least one frame;
receiving from the ASR words that are specific to the at least one frame at a post processor; and
having a dynamic grammar generator generate speech grammars using the words that are specific to the at least one frame, the words being provided from the post processor.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for using a context-based dynamic speech recognition grammar generation system that is suitable for multimodal input when applied to context-based search scenarios. Dynamic context-based grammar is generated for a media stream during a post-processing period. The media stream is fed to an external automatic speech recognizer (ASR) for a specified number of frames. The ASR performs recognition of words that do not occur in common vocabulary that may be specific to those media frames. These words that are specific to the frames are sent back to the post processor, where they are fed to a dynamic grammar generator that generates speech grammars in some format, using the words that are fed to it. This grammar and other contextual information, form a new set of context data for those frames of media. The media, the grammar and other context data. is stored in a database. This is repeated for the entire stream of media, and a full speech recognition grammar can be constructed.
61 Citations
35 Claims
-
1. A method of generating a dynamic contextual speech recognition grammar, comprising:
for each of a plurality of groups of at least one frame of audio content, generating grammars and context data including; providing the at least one frame of audio content to an automatic speech recognizer (ASR) for performing recognition of words that do not occur in common vocabulary and may be specific to the at least one frame; receiving from the ASR words that are specific to the at least one frame at a post processor; and having a dynamic grammar generator generate speech grammars using the words that are specific to the at least one frame, the words being provided from the post processor. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 30)
-
13. An apparatus, comprising:
-
a processor; and a memory unit communicatively coupled to the processor and comprising computer code for, for each of a plurality of groups of at least one frame of audio content, generating grammars and context data including; computer code for providing the at least one frame of audio content to an automatic speech recognizer (ASR) for performing recognition of words that do not occur in common vocabulary and may be specific to the at least one frame; computer code for receiving from the ASR words that are specific to the at least one frame at a post processor; and computer code for having a dynamic grammar generator generate speech grammars using the words that are specific to the at least one frame, the words being provided from the post processor. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A system, comprising:
-
a post processor configured to process a plurality of groups of at least one frame of audio content; an external automatic speech recognizer (ASR) communicatively connected to the post processor and configured to perform recognition of words that do not occur in common vocabulary and may be specific to the at least one frame for each group; a dynamic grammar generator communicatively connected to the post processor and configured to generate speech grammars using the words that are specific to the at least one frame, the words being provided from the external ASR via the post processor; and a database communicatively configured to store the speech grammars generated by the dynamic grammar generator. - View Dependent Claims (23, 24, 25, 26)
-
-
27. A method of searching for a speech segment within a media item, comprising:
-
extracting at least one speech token from a received user query; matching the at least one speech token against an extracted speech grammar associated with the media item; and proceeding to a segment of the media item that matches the at least one speech token. - View Dependent Claims (28, 29, 33)
-
-
31. An apparatus, comprising:
-
a processor; and a memory unit communicatively connected to the processor and including; computer code for extracting at least one speech token from a received user query; computer code for matching the at least one speech token against an extracted speech grammar associated with a media item; and computer code for proceeding to a segment of the media item that matches the at least one speech token. - View Dependent Claims (32)
-
-
34. A system, comprising:
-
means for processing a plurality of groups of at least one frame of audio content; means for performing recognition of words that do not occur in common vocabulary and may be specific to the at least one frame for each group; means for generating speech grammars using the words that are specific to the at least one frame, the words being provided from the external ASR via the post processor; and means for storing the speech grammars generated by the dynamic grammar generator. - View Dependent Claims (35)
-
Specification