Methods and systems for speech-enabling a human-to-machine interface

US 8,909,536 B2
Filed: 04/20/2012
Issued: 12/09/2014
Est. Priority Date: 04/20/2012
Status: Active Grant

First Claim

Patent Images

1. A method for speech-enabling a human-to-machine interface, the method comprising:

by a processor,loading content of the human-to-machine interface;

adding logic configured to enable speech interaction with the content to the interface;

presenting the content to a user of the interface; and

activating speech interaction with the content via the logic for the user;

the logic including;

uniquely identifying a plurality of input fields and corresponding input field identifications (IDs) associated with the content;

mapping the input field IDs to grammar slot names to produce a speech-to-field mapping, the grammar slot names associated with speech-to-text synthesis of speech expected to be received by the logic and outputting representations of the speech having correspondence with the grammar slot names; and

enabling a flow of representations of speech to the input fields via the speech-to-field mapping.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Generally, human-to-machine interfaces are configured to accept speech input from a user. However, such interfaces, e.g., web browsers, must be configured to enable acceptance of speech input from the user. Some interfaces, such as mobile browsers, have less configuration adaptability and are not able to be configured to accept speech input from a user. Embodiments of the present invention speech-enable human-to-machine interfaces by loading content of the human-to-machine interface and adding logic configured to enable speech interaction with the content to the interface. The embodiment then activates speech interaction with the content via the logic for the user. Thus, embodiments of the present invention enable speech interaction with interfaces that are not configured to be adapted to allow speech interaction and are able to enable the speech interaction in a seamless manner.

Citations

19 Claims

1. A method for speech-enabling a human-to-machine interface, the method comprising:
- by a processor,loading content of the human-to-machine interface;
  
  adding logic configured to enable speech interaction with the content to the interface;
  
  presenting the content to a user of the interface; and
  
  activating speech interaction with the content via the logic for the user;
  
  the logic including;
  
  uniquely identifying a plurality of input fields and corresponding input field identifications (IDs) associated with the content;
  
  mapping the input field IDs to grammar slot names to produce a speech-to-field mapping, the grammar slot names associated with speech-to-text synthesis of speech expected to be received by the logic and outputting representations of the speech having correspondence with the grammar slot names; and
  
  enabling a flow of representations of speech to the input fields via the speech-to-field mapping.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 12, 13, 14)
- - 2. The method of claim 1 further comprising enabling speech interaction with the input fields, the input fields including interactive elements displayed on a screen view including at least one of the following input field structures:
    - radio buttons, text fields, buttons, and drop down menus.
  - 3. The method of claim 1 wherein adding the logic includes:
    - parsing code associated with the content, the code, when executed by a processor associated with the interface, causes the interface to present the input fields to a user; and
      
      using results of the parsing, identifying input fields uniquely, wherein the results of the parsing include the input fields and corresponding input field IDs.
  - 4. The method of claim 1 wherein mapping the input field IDs to the grammar slot names includes determining keywords from the input field IDs and matching the keywords determined with keywords associated with the grammar slot names.
  - 5. The method of claim 1 wherein mapping the input field IDs to the grammar slot names is based on user selection of at least a subset of the plurality of input fields;
    - and wherein mapping further includes associating the corresponding input field ID(s) with the grammar slot name responsive to speech received subsequent to the user selection.
  - 6. The method of claim 1 wherein mapping the input field IDs to the grammar slot further includes:
    - prompting a user for voice input for a specific input field; and
      
      correlating the corresponding field ID of the specific input field with the grammar slot name corresponding to speech received subsequent to the user selection.
  - 7. The method of claim 1 further comprising prompting the user for speech with representations of values provided therein to be applied to a specific input field of the plurality of input fields.
  - 8. The method of claim 7 wherein prompting the user for speech includes performing at least one of the following actions:
    - highlighting the specific input field, changing a state of a textual display presented via the interface, and presenting an audio indication for speech associated with a specific input field.
  - 9. The method of claim 7 wherein prompting the user for speech includes prompting the user for speech relating to multiple input fields of the plurality of input fields.
  - 12. The system of claim 1 wherein the mapping module is further configured to determine keywords from the input field IDs and match the keywords determined with keywords associated with the grammar slot names.
  - 13. The system of claim 1 wherein the mapping module is further configured to map the input field IDs to the grammar slot names is based on user selection of at least a subset of the plurality of input fields and associate the corresponding input field ID(s) with the grammar slot name responsive to speech received subsequent to the user selection.
  - 14. The system of claim 1 further comprising a prompting module configured to:
    - prompt a user for voice input for a specific input field; and
      
      correlate the corresponding field ID of the specific input field with the grammar slot name corresponding to speech received subsequent to the user selection.

10. A system for speech-enabling a human-to-machine interface, the system comprising:
- one or more modules implemented in hardware or as instructions executing on a processor, the one or more modules including;
  
  a loading module configured to load content of the human-to-machine interface;
  
  an applying module configured to apply logic to the content, the logic configured to enable speech interaction with the content;
  
  a presenting module configured to present the content to a user of the interface; and
  
  a speech interaction module configured to activate speech interaction with the content via the logic for the user;
  
  the logic including;
  
  an identifying module configured to uniquely identify a plurality of input fields and corresponding input field identifications (IDs) associated with the content;
  
  a mapping module configured to map the input field IDs to grammar slot names to produce a speech-to-field mapping, the grammar slot names associated with speech-to-text synthesis of speech expected to be received by the logic and output representations of the speech having correspondence with the grammar slot names; and
  
  a transcribing module configured to enable a flow of representations of speech to the input fields via the speech-to-field mapping.
- View Dependent Claims (11, 15, 16, 17, 18)
- - 11. The system of claim 10 further comprising:
    - a parsing module configured to parse code associated with the content, the code, when executed by a processor associated with the interface, causes the interface to present the input fields to a user; and
      
      using results from the parsing module, an identifying module configured to identify input fields uniquely, wherein the results of the parsing module include the input fields and corresponding input field IDs.
  - 15. The system of claim 10 further comprising a prompting module configured to prompt the user for speech with representations of values provided therein to be applied to a specific input field of the plurality of input fields.
  - 16. The system of claim 15 wherein the prompting module is further configured to prompt the user for speech by performing at least one of the following actions:
    - highlighting the specific input field, changing a state of a textual display presented via the interface, and presenting an audio indication for speech associated with a specific input field.
  - 17. The system of claim 15 wherein the prompting module is further configured to prompt the user for speech includes prompting the user for speech relating to multiple input fields of the plurality of input fields.
  - 18. The system of claim 15 further comprising a speech interaction module configured to enable speech interaction with the input fields, the input fields including interactive elements displayed on a screen view including at least one of the following input field structures:
    - radio buttons, text fields, buttons, and drop down menus.

19. A non-transitory computer readable medium having computer readable program codes embodied therein for speech-enabling a human-to-machine interface, the computer readable program codes including instructions that, when executed by a processor, cause the processor to:
- load content of the human-to-machine interface;
  
  apply logic to the content, the logic configured to enable speech interaction with the content;
  
  present the content to a user of the interface; and
  
  activate speech interaction with the content via the logic for the user;
  
  the logic further configured to cause the processor to;
  
  uniquely identify a plurality of input fields and corresponding input field identifications (IDs) associated with the content;
  
  map the input field IDs to grammar slot names to produce a speech-to-field mapping, the grammar slot names associated with speech-to-text synthesis of speech expected to be received by the logic and outputting representations of the speech having correspondence with the grammar slot names; and
  
  enable a flow of representations of speech to the input fields via the speech-to-field mapping.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Mauro, David Andrew, Bouvier, Henri
Primary Examiner(s)
MCFADDEN, SUSAN IRIS

Application Number

US13/452,557
Publication Number

US 20130282381A1
Time in Patent Office

963 Days
Field of Search

704/275
US Class Current

704/275
CPC Class Codes

G06F 3/167   Audio in a user interface, ...

G10L 15/183   using context dependencies,...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 17/22   Interactive procedures; Man...

Methods and systems for speech-enabling a human-to-machine interface

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Methods and systems for speech-enabling a human-to-machine interface

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links