Systems and methods for managing multiple grammars in a speech recongnition system

US 20030171929A1
Filed: 02/04/2002
Published: 09/11/2003
Est. Priority Date: 02/04/2002
Status: Active Grant

First Claim

Patent Images

1. A speech system, comprising:

a speech engine configured to recognize commands from a user and make announcements to the user;

a speech server having a speech server interface through which multiple speech-enabled applications communicate with the speech system, and a speech application programming interface through which the speech server communicates with the speech engine; and

wherein the speech server manages concurrent processing of interactions submitted by the speech-enabled applications while allowing each speech-enabled application to utilize a different speech recognition grammar.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods are described for a speech system that manages multiple grammars from one or more speech-enabled applications. The speech system includes a speech server that supports different grammars and different types of grammars by exposing several methods to the speech-enabled applications. The speech server supports static grammars that do not change and dynamic grammars that may change after a commit. The speech server provides persistence by supporting persistent grammars that enable a user to issue a command to an application even when the application is not loaded. In such a circumstance, the application is automatically launched and the command is processed. The speech server may enable or disable a grammar in order to limit confusion between grammars. Global and yielding grammars are also supported by the speech server. Global grammars are always active (e.g., “call 9-1-1”) while yielding grammars may be deactivated when an interaction whose grammar requires priority is active.

81 Citations

View as Search Results

37 Claims

1. A speech system, comprising:
- a speech engine configured to recognize commands from a user and make announcements to the user;
  
  a speech server having a speech server interface through which multiple speech-enabled applications communicate with the speech system, and a speech application programming interface through which the speech server communicates with the speech engine; and
  
  wherein the speech server manages concurrent processing of interactions submitted by the speech-enabled applications while allowing each speech-enabled application to utilize a different speech recognition grammar.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The speech system as recited in claim 1, wherein the speech server is configured to support persistent grammars for the speech-enabled applications so that a user command belonging to a persistent grammar is recognized whether the associated speech recognition application is running or not.
  - 3. The speech system as recited in claim 2, wherein the speech server is further configured to launch a speech-enabled application if the speech-enabled application uses a persistent grammar and the speech-enabled application is idle when a recognition is made that belongs to the persistent grammar used by the speech-enabled application.
  - 4. The speech system as recited in claim 1, wherein the speech server is further configured to:
    - support yielding grammars for the speech-enabled applications; and
      
      deactivate all but one active yielding grammar.
  - 5. The speech system as recited in claim 1, wherein the speech server is further configured to:
    - support yielding grammars and global grammars for the speech-enabled applications; and
      
      activate the global grammars and continuously monitor for recognitions in the global grammars.
  - 6. The speech system as recited in claim 1, further comprising a grammar table for each grammar used by the speech-enabled applications, each grammar table containing one or more grammar attributes for the grammar with which it associated.
  - 7. The speech system as recited in claim 6, wherein a grammar attribute in each grammar table is a grammar identifier that uniquely identifies the grammar associated with the grammar table.
  - 8. The speech system as recited in claim 6, wherein:
    - a grammar attribute in each grammar table is an executable command of a speech-enabled application that, when executed, launches the speech-enabled application; and
      
      the speech server is further configured to execute the executable command to launch the speech-enabled application when the speech server recognizes the recognition term issued by a user and the speech-enabled application is not loaded.
  - 9. The speech system as recited in claim 6, wherein a grammar attribute in each grammar table is a global flag that, if set, indicates that the grammar associated with the grammar table is a global grammar that may not be deactivated by the speech server.
  - 10. The speech system as recited in claim 6, wherein:
    - a grammar attribute in each grammar table is a persistent flag that, if set, indicates that the grammar associated with the grammar table is a persistent grammar that may be launched by the speech server when the speech server recognizes a command belonging to the grammar; and
      
      an application associated with the persistent grammar is launched by the speech server when the speech server recognizes a command belonging to the grammar.
  - 11. The speech system as recited in claim 6, wherein a grammar attribute in each grammar table is an active flag that, if set, indicates that the grammar associated with the grammar is currently active.
  - 12. The speech system as recited in claim 6, wherein a grammar attribute in each grammar table is a static flag that, if set, indicates that the grammar associated with the grammar is a static grammar that cannot be changed after the grammar table is stored in the speech system.
  - 13. The speech system as recited in claim 6, further comprising a master grammar table that contains each grammar table used by the speech recognition applications.

14. A method, comprising:
- receiving a speech interaction from a speech-enabled application;
  
  identifying a speech grammar associated with the speech interaction; and
  
  processing the speech interaction according to grammar attributes contained in a grammar table associated with the identified speech grammar.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22)
- - 15. The method as recited in claim 14, wherein the speech interaction is a first speech interaction, the speech grammar is a first speech grammar, the grammar table is a first grammar table, the speech-enabled application is a first speech recognition application, and the method further comprises:
    - receiving a second speech interaction from a second speech-enabled application while the first speech interaction is being processed;
      
      identifying a second speech grammar associated with the second speech interaction; and
      
      processing the second speech interaction according to grammar attributes contained in a second grammar table that is associated with the second speech grammar.
  - 16. The method as recited in claim 15, wherein the processing the second speech interaction further comprises processing the second speech interaction after the first speech interaction has concluded processing if a grammar attribute in the second grammar table indicates that the second speech grammar is a yielding grammar.
  - 17. The method as recited in claim 14, wherein a grammar attribute included in the grammar table is a grammar identifier that uniquely identifies the speech grammar associated with the grammar table.
  - 18. The method as recited in claim 14, wherein a grammar attribute included in the grammar table is an executable command that, when executed, launches the speech recognition application associated with the grammar table.
  - 19. The method as recited in claim 14, wherein a grammar attribute included in the grammar table is a global flag that, when set, indicates that the speech grammar associated with the grammar table is a global grammar that may only be deactivated by the speech-enabled application that submitted the speech interaction and, when not set, indicates that the speech grammar associated with the grammar table is a yielding grammar.
  - 20. The method as recited in claim 14, wherein a grammar attribute included in the grammar table is a persistent flag that, when set, indicates that the speech grammar associated with the grammar table is a persistent grammar that is active even when the speech recognition application associated with the speech grammar is not loaded.
  - 21. The method as recited in claim 14, wherein a grammar attribute included in the grammar table is an active flag that, when set, indicates that the speech grammar associated with the grammar table is currently active and, when not set, indicates that the speech grammar associated with the grammar table is currently inactive.
  - 22. The method as recited in claim 14, wherein a grammar attribute included in the grammar table is a static flag that, when set, indicates that the speech grammar associated with the grammar table may not be changed after the speech grammar is committed.

23. One or more computer-readable media containing computer-executable instructions that, when executed on a computer, perform the following steps:
- receiving a first interaction from a first speech-enabled application that utilizes a first grammar;
  
  processing the first interaction according to properties attributable to the first grammar;
  
  receiving a second interaction from a second speech-enabled application that utilizes a second grammar, the second interaction being received while the first interaction is processing;
  
  processing the second interaction according to properties attributable to the second grammar;
  
  wherein the processing of the first interruption is interrupted and processing of the second interruption is immediately commenced if an indication is detected directing that the second interaction be processed immediately.
- View Dependent Claims (24, 25, 26, 27, 28)
- - 24. The one or more computer-readable media as recited in claim 23, wherein the indication directing that the second interaction be processed immediately is a flag contained in the interaction.
  - 25. The one or more computer-readable media as recited in claim 23, wherein the indication directing that the second interaction be processed immediately is a property attributable to the second grammar.
  - 26. The one or more computer-readable media as recited in claim 23, further comprising providing a grace period after processing of the second interaction h as completed before beginning processing a third interaction .
  - 27. The one or more computer-readable media as recited in claim 23, further comprising completing processing of the first interaction after the processing of the second interaction has concluded.
  - 28. The one or more computer-readable media as recited in claim 23, further comprising:
    - detecting a self-destruct indicator associated with the first interaction; and
      
      terminating further processing of the first interaction.

29. A speech server interface exposed by a speech system for use by one or more speech-enabled applications, comprising a persist method that a speech-enabled application uses to persist a grammar used by the speech-enabled application.
- View Dependent Claims (30)
- - 30. The speech server interface as recited in claim 29, wherein the persist method further comprises a launch path parameter that provides an executable command that the speech system uses to launch the speech-enabled application.

31. A speech server interface exposed by a speech system for use by one or more speech-enabled applications, comprising a create grammar method that is used to load an existing grammar used by a speech-enabled application or to load a new grammar for the speech-enabled application.
- View Dependent Claims (32)
- - 32. The speech server interface as recited in claim 31, further comprising a remove grammar method that is used to remove a grammar from the speech system.

33. A speech server interface exposed by a speech system for use by one or more speech-enabled applications, comprising a yield-to-grammar method used by a speech-enabled application to de-activate all yielding grammars in the speech system other than the grammar used by the speech-enabled application.
- View Dependent Claims (34)
- - 34. The speech server interface as recited in claim 33, further comprising an unyield-to-grammar method used by a speech-enabled application to make other yielding grammars unyield.

35. A speech server interface exposed by a speech system for use by one or more speech-enabled applications, comprising an advise speech events method used by a speech-enabled application to let the speech system know that the speech-enabled application is listening for speech recognition events.
- View Dependent Claims (36, 37)
- - 36. The speech server interface as recited in claim 35, further comprising an unadvise speech events method used by a speech-enabled application to let the speech system know that the speech-enabled application is not listening for speech recognition events.
  - 37. The speech server interface as recited in claim 36, further comprising:
    - a create grammar method that is used to load an existing grammar used by a speech-enabled application or to load a new grammar for the speech-enabled application;
      
      a get grammar identifier method that is used to obtain a value uniquely associated with the loaded grammar;
      
      a remove grammar method that is used to remove a grammar from the speech system;
      
      a persist method that a speech-enabled application uses to persist a grammar used by the speech-enabled application;
      
      a yield-to-grammar method used by a speech-enabled application to make yielding grammars in the speech system yield to the grammar used by the speech-enabled application;
      
      an unyield-to-grammar method used by a speech-enabled application to make other yielding grammars unyield;
      
      a commit method used to commit a grammar to the speech system;
      
      a get rule method used by the speech system to construct and control individual rules in a grammar;
      
      a create new state method used by the speech system to create new states in a grammar;
      
      an add word transition method used to add a transition between two states on a word;
      
      an add rule transition method used to add a transition between two states on a rule;
      
      a set rule state method used to activate and de-activate rules;
      
      a set grammar state method used to activate and de-activate grammars;
      
      a get grammar state method used to get a grammar state;
      
      a get recognition method used to get a recognition that has occurred;
      
      a get alternate method used to get alternates to a recognition that has occurred;
      
      a turn speech recognizer on method that is used by a speech-enabled application to activate a speech recognizer in the speech system;
      
      a turn speech recognizer off method that is used by a speech-enabled application to deactivate a speech recognizer in the speech system;
      
      a get recognizer state method used to get a speech recognizer state;
      
      an advise SAPI event method used to pass in a sink that is called when an event that is advised for occurs;
      
      an unadvise SAPI event method used to let the speech system know that a sink used with the advise SAPI event method is no longer interested in SAPI events;
      
      a get recognition context method used to get a speech recognition context pointer from a speech engine in the speech system; and
      
      a get voice method used to get a voice pointer from the speech engine.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Falcon, Steve Russel, Yip, Clement Chun Pong, Miller, David Michael, Banay, Dan

Granted Patent

US 7,167,831 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/275
CPC Class Codes

G10L 15/30 Distributed recognition, e....

G10L 2015/228 of application context

Systems and methods for managing multiple grammars in a speech recongnition system

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

81 Citations

37 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for managing multiple grammars in a speech recongnition system

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

81 Citations

37 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links