AUTOMATIC SPEECH RECOGNITION WITH A SELECTION LIST

US 20080162136A1
Filed: 01/03/2007
Published: 07/03/2008
Est. Priority Date: 01/03/2007
Status: Active Grant

First Claim

Patent Images

1. A method of automatic speech recognition (‘

ASR’

), the method implemented with a speech recognition grammar of a multimodal application, with the multimodal application operating on a multimodal device supporting multiple modes of user interaction with the multimodal application, the modes of user interaction including a voice mode and a visual mode, the multimodal application operatively coupled to a grammar interpreter, the method comprising;

accepting by the multimodal application speech input and visual input for selecting or deselecting items in a selection list, the speech input enabled by a speech recognition grammar, the speech recognition grammar including a semantic interpretation script capable of producing a semantic interpretation token having a value that indicates whether to select or deselect items in the selection list;

providing, from the multimodal application to the grammar interpreter, the speech input and the speech recognition grammar;

receiving, by the multimodal application from the grammar interpreter, interpretation results, the interpretation results including matched words from the grammar that correspond to items in the selection list and a semantic interpretation token that specifies whether to select or deselect items in the selection list; and

determining, by the multimodal application in dependence upon the value of the semantic interpretation token, whether to select or deselect items in the selection list that correspond to the matched words.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, apparatus, and computer program products are described for automatic speech recognition (‘ASR’) that include accepting by the multimodal application speech input and visual input for selecting or deselecting items in a selection list, the speech input enabled by a speech recognition grammar; providing, from the multimodal application to the grammar interpreter, the speech input and the speech recognition grammar; receiving, by the multimodal application from the grammar interpreter, interpretation results including matched words from the grammar that correspond to items in the selection list and a semantic interpretation token that specifies whether to select or deselect items in the selection list; and determining, by the multimodal application in dependence upon the value of the semantic interpretation token, whether to select or deselect items in the selection list that correspond to the matched words.

Citations

20 Claims

1. A method of automatic speech recognition (‘
- ASR’
  
  ), the method implemented with a speech recognition grammar of a multimodal application, with the multimodal application operating on a multimodal device supporting multiple modes of user interaction with the multimodal application, the modes of user interaction including a voice mode and a visual mode, the multimodal application operatively coupled to a grammar interpreter, the method comprising;
  
  accepting by the multimodal application speech input and visual input for selecting or deselecting items in a selection list, the speech input enabled by a speech recognition grammar, the speech recognition grammar including a semantic interpretation script capable of producing a semantic interpretation token having a value that indicates whether to select or deselect items in the selection list;
  
  providing, from the multimodal application to the grammar interpreter, the speech input and the speech recognition grammar;
  
  receiving, by the multimodal application from the grammar interpreter, interpretation results, the interpretation results including matched words from the grammar that correspond to items in the selection list and a semantic interpretation token that specifies whether to select or deselect items in the selection list; and
  
  determining, by the multimodal application in dependence upon the value of the semantic interpretation token, whether to select or deselect items in the selection list that correspond to the matched words.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1 wherein the speech input is synchronized by the multimodal application with the visual input.
  - 3. The method of claim 1 wherein:
    - the semantic interpretation script is further capable of producing a semantic interpretation token specifying that all items in the selection list are to be either selected or deselected;
      
      the received interpretation results include a semantic interpretation token that specifies whether to select or deselect all items in the selection list; and
      
      determining whether to select or deselect items in the selection list further comprises determining in dependence upon the value of the semantic interpretation token whether to select or deselect all items in the selection list, regardless of correspondence of items in the selection list to the matched words.
  - 4. The method of claim 1 further comprising:
    - establishing in the multimodal device a configuration parameter for the multimodal application, the value of the configuration parameter user-editable, the value of the configuration parameter indicating whether to add to existing item selections items that correspond to the matched words or replace existing item selections with items that correspond to the matched words;
      
      wherein determining whether to select or deselect items in the selection list that correspond to the matched words further comprises determining whether to select or deselect items in the selection list that correspond to the matched words in dependence upon the value of the configuration parameter, regardless of the value of the semantic interpretation token.
  - 5. The method of claim 1 wherein the multimodal device further comprises a thick multimodal client device containing the multimodal application, the grammar interpreter, and all the functionality needed to carry out speech recognition and grammar interpretation, including semantic interpretation.
  - 6. The method of claim 1 wherein the multimodal device further comprises a thin multimodal client device that does not contain a grammar interpreter or a speech engine, the thin multimodal client device obtaining grammar interpretation, semantic interpretation, and speech recognition services from a voice server located remotely across a network from the thin multimodal client device.

7. Apparatus for automatic speech recognition (‘
- ASR’
  
  ), the apparatus implemented with a speech recognition grammar of a multimodal application, with the multimodal application operating on a multimodal device supporting multiple modes of user interaction with the multimodal application, the modes of user interaction including a voice mode and a visual mode, the multimodal application operatively coupled to a grammar interpreter, the apparatus comprising a computer processor and a computer memory operatively coupled to the computer processor, the computer memory having disposed within it computer program instructions capable of;
  
  accepting by the multimodal application speech input and visual input for selecting or deselecting items in a selection list, the speech input enabled by a speech recognition grammar, the speech recognition grammar including a semantic interpretation script capable of producing a semantic interpretation token having a value that indicates whether to select or deselect items in the selection list;
  
  providing, from the multimodal application to the grammar interpreter, the speech input and the speech recognition grammar;
  
  receiving, by the multimodal application from the grammar interpreter, interpretation results, the interpretation results including matched words from the grammar that correspond to items in the selection list and a semantic interpretation token that specifies whether to select or deselect items in the selection list; and
  
  determining, by the multimodal application in dependence upon the value of the semantic interpretation token, whether to select or deselect items in the selection list that correspond to the matched words.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The apparatus of claim 7 wherein the speech input is synchronized by the multimodal application with the visual input.
  - 9. The apparatus of claim 7 wherein:
    - the semantic interpretation script is further capable of producing a semantic interpretation token specifying that all items in the selection list are to be either selected or deselected;
      
      the received interpretation results include a semantic interpretation token that specifies whether to select or deselect all items in the selection list; and
      
      determining whether to select or deselect items in the selection list further comprises determining in dependence upon the value of the semantic interpretation token whether to select or deselect all items in the selection list, regardless of correspondence of items in the selection list to the matched words.
  - 10. The apparatus of claim 7 further comprising computer program instructions capable of:
    - establishing in the multimodal device a configuration parameter for the multimodal application, the value of the configuration parameter user-editable, the value of the configuration parameter indicating whether to add to existing item selections items that correspond to the matched words or replace existing item selections with items that correspond to the matched words;
      
      wherein determining whether to select or deselect items in the selection list that correspond to the matched words further comprises determining whether to select or deselect items in the selection list that correspond to the matched words in dependence upon the value of the configuration parameter, regardless of the value of the semantic interpretation token.
  - 11. The apparatus of claim 7 wherein the multimodal device further comprises a thick multimodal client device containing the multimodal application, the grammar interpreter, and all the functionality needed to carry out speech recognition and grammar interpretation, including semantic interpretation.
  - 12. The apparatus of claim 7 wherein the multimodal device further comprises a thin multimodal client device that does not contain a grammar interpreter or a speech engine, the thin multimodal client device obtaining grammar interpretation, semantic interpretation, and speech recognition services from a voice server located remotely across a network from the thin multimodal client device.

13. A computer program product for automatic speech recognition (‘
- ASR’
  
  ), the computer program product comprising a multimodal application that includes a speech recognition grammar, the multimodal application capable of operating on a multimodal device supporting multiple modes of user interaction with the multimodal application, the modes of user interaction including a voice mode and a visual mode, the multimodal application operatively coupled to a grammar interpreter, the computer program product disposed upon a computer-readable, signal-bearing medium, the computer program product comprising computer program instructions capable of;
  
  accepting by the multimodal application speech input and visual input for selecting or deselecting items in a selection list, the speech input enabled by a speech recognition grammar, the speech recognition grammar including a semantic interpretation script capable of producing a semantic interpretation token having a value that indicates whether to select or deselect items in the selection list;
  
  providing, from the multimodal application to the grammar interpreter, the speech input and the speech recognition grammar;
  
  receiving, by the multimodal application from the grammar interpreter, interpretation results, the interpretation results including matched words from the grammar that correspond to items in the selection list and a semantic interpretation token that specifies whether to select or deselect items in the selection list; and
  
  determining, by the multimodal application in dependence upon the value of the semantic interpretation token, whether to select or deselect items in the selection list that correspond to the matched words.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
- - 14. The computer program product of claim 13 wherein the computer-readable, signal-bearing medium comprises a recordable medium.
  - 15. The computer program product of claim 13 wherein the computer-readable, signal-bearing medium comprises a transmission medium.
  - 16. The computer program product of claim 13 wherein the speech input is synchronized by the multimodal application with the visual input.
  - 17. The computer program product of claim 13 wherein:
    - the semantic interpretation script is further capable of producing a semantic interpretation token specifying that all items in the selection list are to be either selected or deselected;
      
      the received interpretation results include a semantic interpretation token that specifies whether to select or deselect all items in the selection list; and
      
      determining whether to select or deselect items in the selection list further comprises determining in dependence upon the value of the semantic interpretation token whether to select or deselect all items in the selection list, regardless of correspondence of items in the selection list to the matched words.
  - 18. The computer program product of claim 13 further comprising computer program instructions capable of:
    - establishing in the multimodal device a configuration parameter for the multimodal application, the value of the configuration parameter user-editable, the value of the configuration parameter indicating whether to add to existing item selections items that correspond to the matched words or replace existing item selections with items that correspond to the matched words;
      
      wherein determining whether to select or deselect items in the selection list that correspond to the matched words further comprises determining whether to select or deselect items in the selection list that correspond to the matched words in dependence upon the value of the configuration parameter, regardless of the value of the semantic interpretation token.
  - 19. The computer program product of claim 13 wherein the multimodal device further comprises a thick multimodal client device containing the multimodal application, the grammar interpreter, and all the functionality needed to carry out speech recognition and grammar interpretation, including semantic interpretation.
  - 20. The computer program product of claim 13 wherein the multimodal device further comprises a thin multimodal client device that does not contain a grammar interpreter or a speech engine, the thin multimodal client device obtaining grammar interpretation, semantic interpretation, and speech recognition services from a voice server located remotely across a network from the thin multimodal client device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
McCobb, Gerald M., Ativanichayaphong, Soonthorn, Agapi, Ciprian, Cross, Charles W.

Granted Patent

US 8,612,230 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/251
CPC Class Codes

G10L 15/193   Formal grammars, e.g. finit...

G10L 15/26   Speech to text systems G10L...

H04M 2201/40   using speech recognition

H04M 2203/105   Financial transactions and ...

H04M 3/4938   comprising a voice browser ...

AUTOMATIC SPEECH RECOGNITION WITH A SELECTION LIST

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

AUTOMATIC SPEECH RECOGNITION WITH A SELECTION LIST

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links