Automatic speech recognition with a selection list

US 8,612,230 B2
Filed: 01/03/2007
Issued: 12/17/2013
Est. Priority Date: 01/03/2007
Status: Active Grant

First Claim

Patent Images

1. A method of automatic speech recognition (‘

ASR’

), the method implemented with a speech recognition grammar of a multimodal application, with the multimodal application operating on a multimodal device supporting multiple modes of user interaction with the multimodal application, the modes of user interaction including a voice mode and a visual mode, the multimodal application operatively coupled to a grammar interpreter and configured to enable a user of the multimodal application to select or deselect multiple items in a selection list using a single utterance, the method comprising;

accepting, by the multimodal application, speech input corresponding to the single utterance for selecting or deselecting one or more items in the selection list;

providing, from the multimodal application to the grammar interpreter, the speech input and a speech recognition grammar associated with the selection list;

receiving, by the multimodal application from the grammar interpreter, interpretation results, the interpretation results including at least one matched word from the grammar that identifies at least one item in the selection list and a separate indication of whether to select or deselect the at least one item in the selection list, wherein the separate indication is based, at least in part, on the speech input; and

selecting or deselecting based, at least in part, on the separate indication, the at least one item in the selection list that corresponds to the at least one matched word.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, apparatus, and computer program products are described for automatic speech recognition (‘ASR’) that include accepting by the multimodal application speech input and visual input for selecting or deselecting items in a selection list, the speech input enabled by a speech recognition grammar; providing, from the multimodal application to the grammar interpreter, the speech input and the speech recognition grammar; receiving, by the multimodal application from the grammar interpreter, interpretation results including matched words from the grammar that correspond to items in the selection list and a semantic interpretation token that specifies whether to select or deselect items in the selection list; and determining, by the multimodal application in dependence upon the value of the semantic interpretation token, whether to select or deselect items in the selection list that correspond to the matched words.

Citations

18 Claims

1. A method of automatic speech recognition (‘
- ASR’
  
  ), the method implemented with a speech recognition grammar of a multimodal application, with the multimodal application operating on a multimodal device supporting multiple modes of user interaction with the multimodal application, the modes of user interaction including a voice mode and a visual mode, the multimodal application operatively coupled to a grammar interpreter and configured to enable a user of the multimodal application to select or deselect multiple items in a selection list using a single utterance, the method comprising;
  
  accepting, by the multimodal application, speech input corresponding to the single utterance for selecting or deselecting one or more items in the selection list;
  
  providing, from the multimodal application to the grammar interpreter, the speech input and a speech recognition grammar associated with the selection list;
  
  receiving, by the multimodal application from the grammar interpreter, interpretation results, the interpretation results including at least one matched word from the grammar that identifies at least one item in the selection list and a separate indication of whether to select or deselect the at least one item in the selection list, wherein the separate indication is based, at least in part, on the speech input; and
  
  selecting or deselecting based, at least in part, on the separate indication, the at least one item in the selection list that corresponds to the at least one matched word.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1 wherein the speech input is synchronized by the multimodal application with a visual input.
  - 3. The method of claim 1 wherein:
    - the separate indication of whether to select or deselect the at least one item in the selection list indicates that all items in the selection list are to be either selected or deselected; and
      
      selecting or deselecting all items in the selection list based, at least in part, on the separate indication, regardless of a correspondence of items in the selection list to the at least one matched-word.
  - 4. The method of claim 1 further comprising:
    - establishing in the multimodal device a configuration parameter for the multimodal application, the value of the configuration parameter being user-editable and indicating whether to add to existing item selections items that correspond to the at least one matched word, or to replace existing item selections with items that correspond to the at least one matched word;
      
      wherein determining whether to select or deselect items in the selection list that correspond to the at least one matched word further comprises determining whether to select or deselect items in the selection list that correspond to the at least one matched word based, at least in part, on the value of the configuration parameter, regardless of the content of the separate indication of whether to select or deselect the at least one item in the selection list.
  - 5. The method of claim 1 wherein the multimodal device further comprises a thick multimodal client device including the multimodal application, the grammar interpreter, and functionality for performing speech recognition and grammar interpretation, including semantic interpretation.
  - 6. The method of claim 1 wherein the multimodal device further comprises a thin multimodal client device, the thin multimodal client device obtaining grammar interpretation, semantic interpretation, and speech recognition services from a voice server located remotely across a network from the thin multimodal client device.

7. Apparatus for automatic speech recognition (‘
- ASR’
  
  ) for use with a speech recognition grammar of a multimodal application, with the multimodal application operating on a multimodal device supporting multiple modes of user interaction with the multimodal application, the modes of user interaction including a voice mode and a visual mode, the multimodal application operatively coupled to a grammar interpreter and configured to enable a user of the multimodal application to select or deselect multiple items in a selection list using a single utterance, the apparatus comprising;
  
  a computer processor; and
  
  a computer memory operatively coupled to the computer processor, the computer memory storing a computer program that, when executed by the computer processor, performs a method comprising;
  
  accepting by the multimodal application speech input corresponding to the single utterance for selecting or deselecting one or more items in the selection list;
  
  providing, from the multimodal application to the grammar interpreter, the speech input and a speech recognition grammar associated with the selection list;
  
  receiving, by the multimodal application from the grammar interpreter, interpretation results, the interpretation results including at least one matched word from the grammar that identifies at least one item in the selection list and a separate indication of whether to select or deselect the at least one item in the selection list, wherein the separate indication is based, at least in part, on the speech input; and
  
  selecting or deselecting, based, at least in part, on the separate indication, the at least one item in the selection list that corresponds to the at least one matched word.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The apparatus of claim 7 wherein the speech input is synchronized by the multimodal application with a visual input.
  - 9. The apparatus of claim 7 wherein:
    - the separate indication of whether to select or deselect the at least one item in the selection list indicates that all items in the selection list are to be either selected or deselected; and
      
      selecting or deselecting all items in the selection list based, at least in part, on the separate indication, regardless of a correspondence of items in the selection list to the at least one matched-word.
  - 10. The apparatus of claim 7, wherein the method further comprises:
    - establishing in the multimodal device a configuration parameter for the multimodal application, the value of the configuration parameter being user-editable and indicating whether to add to existing item selections, items that correspond to the at least one matched word or replace existing item selections with items that correspond to the at least one matched word;
      
      wherein determining whether to select or deselect items in the selection list that correspond to the at least one matched word further comprises determining whether to select or deselect items in the selection list that correspond to the at least one matched word based, at least in part, on the value of the configuration parameter, regardless of the content of the separate indication of whether to select or deselect the at least one item in the selection list.
  - 11. The apparatus of claim 7 wherein the multimodal device further comprises a thick multimodal client device including the multimodal application, the grammar interpreter, and functionality for performing speech recognition and grammar interpretation, including semantic interpretation.
  - 12. The apparatus of claim 7 wherein the multimodal device further comprises a thin multimodal client device, the thin multimodal client device obtaining grammar interpretation, semantic interpretation, and speech recognition services from a voice server located remotely across a network from the thin multimodal client device.

13. A computer-readable recordable medium encoded with a plurality of instructions that, when executed by a computer, perform a method comprising:
- accepting, by a multimodal application, speech input for incrementally selecting or deselecting at least one item in a selection list, wherein the speech input includes an indication of whether to select or deselect the at least one item;
  
  providing, from the multimodal application to a grammar interpreter, the speech input and a speech recognition grammar associated with the selection list;
  
  receiving, by the multimodal application from the grammar interpreter, interpretation results, the interpretation results including at least one matched word from the grammar that identifies at least one item in the selection list and a separate indication of whether to select or deselect the at least one item in the selection list, wherein the separate indication is based, at least in part, on the indication in the speech input of whether to select or deselect the at least one item; and
  
  selecting or deselecting based, at least in part, on the separate indication, the at least one item in the selection list that corresponds to the at least one matched word without first deselecting all previously selected items in the selection list.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The computer-readable recordable medium of claim 13 wherein the speech input is synchronized by the multimodal application with a visual input.
  - 15. The computer-readable recordable medium of claim 13 wherein:
    - the separate indication of whether to select or deselect the at least one item in the selection list indicates that all items in the selection list are to be either selected or deselected; and
      
      selecting or deselecting all items in the selection list based, at least in part, on the separate indication, regardless of a correspondence of items in the selection list to the at least one matched-word.
  - 16. The computer-readable recordable medium of claim 13, wherein the method further comprises:
    - establishing in the multimodal device a configuration parameter for the multimodal application, the value of the configuration parameter being user-editable and indicating whether to add to existing item selections, items that correspond to the at least one matched word or replace existing item selections with items that correspond to the at least one matched word;
      
      wherein determining whether to select or deselect items in the selection list that correspond to the at least one matched word further comprises determining whether to select or deselect items in the selection list that correspond to the at least one matched word based, at least in part, on the value of the configuration parameter, regardless of the content of the separate indication of whether to select or deselect the at least one item in the selection list.
  - 17. The computer-readable recordable medium of claim 13 wherein the multimodal device further comprises a thick multimodal client device including the multimodal application, the grammar interpreter, and functionality for performing speech recognition and grammar interpretation, including semantic interpretation.
  - 18. The computer-readable recordable medium of claim 13 wherein the multimodal device further comprises a thin multimodal client device, the thin multimodal client device obtaining grammar interpretation, semantic interpretation, and speech recognition services from a voice server located remotely across a network from the thin multimodal client device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Agapi, Ciprian, Ativanichayaphong, Soonthorn, Cross, Charles W. Jr., McCobb, Gerald M.
Primary Examiner(s)
Godbold, Douglas

Application Number

US11/619,209
Publication Number

US 20080162136A1
Time in Patent Office

2,540 Days
Field of Search

704/270, 704/270.1
US Class Current

704/270
CPC Class Codes

G10L 15/193   Formal grammars, e.g. finit...

G10L 15/26   Speech to text systems G10L...

H04M 2201/40   using speech recognition sp...

H04M 2203/105   Financial transactions and ...

H04M 3/4938   comprising a voice browser ...

Automatic speech recognition with a selection list

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Automatic speech recognition with a selection list

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links