SYSTEM AND METHOD FOR PROVIDING CONTEXT-BASED DYNAMIC SPEECH GRAMMAR GENERATION FOR USE IN SEARCH APPLICATIONS

US 20080154604A1
Filed: 12/22/2006
Published: 06/26/2008
Est. Priority Date: 12/22/2006
Status: Abandoned Application

First Claim

Patent Images

1. A method of generating a dynamic contextual speech recognition grammar, comprising:

for each of a plurality of groups of at least one frame of audio content, generating grammars and context data including;

providing the at least one frame of audio content to an automatic speech recognizer (ASR) for performing recognition of words that do not occur in common vocabulary and may be specific to the at least one frame;

receiving from the ASR words that are specific to the at least one frame at a post processor; and

having a dynamic grammar generator generate speech grammars using the words that are specific to the at least one frame, the words being provided from the post processor.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for using a context-based dynamic speech recognition grammar generation system that is suitable for multimodal input when applied to context-based search scenarios. Dynamic context-based grammar is generated for a media stream during a post-processing period. The media stream is fed to an external automatic speech recognizer (ASR) for a specified number of frames. The ASR performs recognition of words that do not occur in common vocabulary that may be specific to those media frames. These words that are specific to the frames are sent back to the post processor, where they are fed to a dynamic grammar generator that generates speech grammars in some format, using the words that are fed to it. This grammar and other contextual information, form a new set of context data for those frames of media. The media, the grammar and other context data. is stored in a database. This is repeated for the entire stream of media, and a full speech recognition grammar can be constructed.

61 Citations

View as Search Results

35 Claims

1. A method of generating a dynamic contextual speech recognition grammar, comprising:
- for each of a plurality of groups of at least one frame of audio content, generating grammars and context data including;
  
  providing the at least one frame of audio content to an automatic speech recognizer (ASR) for performing recognition of words that do not occur in common vocabulary and may be specific to the at least one frame;
  
  receiving from the ASR words that are specific to the at least one frame at a post processor; and
  
  having a dynamic grammar generator generate speech grammars using the words that are specific to the at least one frame, the words being provided from the post processor.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 30)
- - 2. The method of claim 1, wherein the ASR is an external ASR.
  - 3. The method of claim 1, wherein the external ASR is a network-based ASR.
  - 4. The method of claim 1, wherein the speech grammars are generated in a speech recognition grammar format (SRGF).
  - 5. The method of claim 1, further comprising storing the speech grammars and context data in a database.
  - 6. The method of claim 5, wherein all of the generated speech grammars are appended to each other to create a full speech recognition grammar.
  - 7. The method of claim 6, wherein the full speech recognition grammar is added to a global small-vocabulary grammar that is present in a resident ASR.
  - 8. The method of claim 7, wherein words in the full speech recognition grammar are used by the resident ASR as hot words for searching and navigating within the media idem.
  - 9. The method of claim 1, further comprising:
    - for each of the plurality of groups of at least one frame of audio content, using a Text-to-Speech (TTS) engine to generate text from the at least one frame of audio content for words that are not recognized by the ASR; and
      
      appending the generated text to the generated speech grammars.
  - 10. A computer program product, embodied in a computer-readable medium, comprising computer code for performing the processes of claim 1.
  - 11. The computer program product of claim 10, further comprising computer code for storing the speech grammars and context data in a database.
  - 12. The computer program product of claim 11, wherein all of the generated speech grammars are appended to each other to create a full speech recognition grammar.
  - 30. A computer program product, embodied in a computer-readable medium, including computer code for performing the processes of claim 1.

13. An apparatus, comprising:
- a processor; and
  
  a memory unit communicatively coupled to the processor and comprising computer code for, for each of a plurality of groups of at least one frame of audio content, generating grammars and context data including;
  
  computer code for providing the at least one frame of audio content to an automatic speech recognizer (ASR) for performing recognition of words that do not occur in common vocabulary and may be specific to the at least one frame;
  
  computer code for receiving from the ASR words that are specific to the at least one frame at a post processor; and
  
  computer code for having a dynamic grammar generator generate speech grammars using the words that are specific to the at least one frame, the words being provided from the post processor.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21)
- - 14. The apparatus of claim 13, wherein the ASR is an external ASR.
  - 15. The apparatus of claim 13, wherein the external ASR is a network-based ASR.
  - 16. The apparatus of claim 13, wherein the speech grammars are generated in a speech recognition grammar format (SRGF).
  - 17. The apparatus of claim 13, wherein the memory unit further comprises storing the speech grammars and context data in a database.
  - 18. The apparatus of claim 17, wherein all of the generated speech grammars are appended to each other to create a full speech recognition grammar.
  - 19. The apparatus of claim 18, wherein the full speech recognition grammar is added to a global small-vocabulary grammar that is present in a resident ASR.
  - 20. The apparatus of claim 19, wherein words in the full speech recognition grammar are used by the resident ASR as hot words for searching and navigating within the media idem.
  - 21. The apparatus of claim 13, wherein the memory unit further comprises:
    - computer code for, for each of the plurality of groups of at least one frame of audio content, using a Text-to-Speech (TTS) engine to generate text from the at least one frame of audio content for words that are not recognized by the ASR; and
      
      computer code for appending the generated text to the generated speech grammars.

22. A system, comprising:
- a post processor configured to process a plurality of groups of at least one frame of audio content;
  
  an external automatic speech recognizer (ASR) communicatively connected to the post processor and configured to perform recognition of words that do not occur in common vocabulary and may be specific to the at least one frame for each group;
  
  a dynamic grammar generator communicatively connected to the post processor and configured to generate speech grammars using the words that are specific to the at least one frame, the words being provided from the external ASR via the post processor; and
  
  a database communicatively configured to store the speech grammars generated by the dynamic grammar generator.
- View Dependent Claims (23, 24, 25, 26)
- - 23. The system of claim 22, wherein the database is communicatively connected to a device including a resident ASR, and wherein words in the full speech recognition grammar are used by the resident ASR as hot words for searching and navigating within the audio content.
  - 24. The system of claim 22, wherein the speech grammars are generated in a speech recognition grammar format (SRGF).
  - 25. The system of claim 22, wherein all of the generated speech grammars are appended to each other to create a full speech recognition grammar.
  - 26. The system of claim 25, wherein the full speech recognition grammar is added to a global small-vocabulary grammar that is present in a resident ASR of a device communicatively connected to the database.

27. A method of searching for a speech segment within a media item, comprising:
- extracting at least one speech token from a received user query;
  
  matching the at least one speech token against an extracted speech grammar associated with the media item; and
  
  proceeding to a segment of the media item that matches the at least one speech token.
- View Dependent Claims (28, 29, 33)
- - 28. The method of claim 27, further comprising playing the segment of the media item to the user.
  - 29. The method of claim 27, further comprising:
    - if the at least one speech token cannot be matched with a segment of the media item, requesting a new user query; and
      
      continuing to request new user queries and extract speech tokens until a match is made with a segment of the media item.
  - 33. The method of claim 27, wherein the memory unit further comprises:
    - computer code for, if the at least one speech token cannot be matched with a segment of the media item, requesting a new user query; and
      
      computer code for continuing to request new user queries and extract speech tokens until a match is made with a segment of the media item.

31. An apparatus, comprising:
- a processor; and
  
  a memory unit communicatively connected to the processor and including;
  
  computer code for extracting at least one speech token from a received user query;
  
  computer code for matching the at least one speech token against an extracted speech grammar associated with a media item; and
  
  computer code for proceeding to a segment of the media item that matches the at least one speech token.
- View Dependent Claims (32)
- - 32. The computer program product of claim 31, wherein the memory unit further comprises computer code for playing the segment of the media item to the user.

34. A system, comprising:
- means for processing a plurality of groups of at least one frame of audio content;
  
  means for performing recognition of words that do not occur in common vocabulary and may be specific to the at least one frame for each group;
  
  means for generating speech grammars using the words that are specific to the at least one frame, the words being provided from the external ASR via the post processor; and
  
  means for storing the speech grammars generated by the dynamic grammar generator.
- View Dependent Claims (35)
- - 35. The system of claim 34, wherein all of the generated speech grammars are appended to each other to create a full speech recognition grammar.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nokia Corporation
Original Assignee
Nokia Corporation
Inventors
Pavel, Dana, Sathish, Sailesh

Application Number

US11/615,567
Publication Number

US 20080154604A1
Time in Patent Office

Days
Field of Search
US Class Current

704/257
CPC Class Codes

G10L 15/183   using context dependencies,...

G10L 15/19   Grammatical context, e.g. d...

G10L 15/193   Formal grammars, e.g. finit...

G10L 15/197   Probabilistic grammars, e.g...

G10L 2015/228   of application context

SYSTEM AND METHOD FOR PROVIDING CONTEXT-BASED DYNAMIC SPEECH GRAMMAR GENERATION FOR USE IN SEARCH APPLICATIONS

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

61 Citations

35 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTEM AND METHOD FOR PROVIDING CONTEXT-BASED DYNAMIC SPEECH GRAMMAR GENERATION FOR USE IN SEARCH APPLICATIONS

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

61 Citations

35 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links