Assisted multi-modal dialogue

US 8,311,835 B2
Filed: 08/29/2003
Issued: 11/13/2012
Est. Priority Date: 08/29/2003
Status: Expired due to Fees

First Claim

Patent Images

1. A client-server system for providing assisted multi-modal dialogue, comprising:

a web server for generating client-side markups having recognition and audible prompting for execution on a client having recognition capabilities, the web server further including controls for generating the client side markups, the controls including speech controls inheritance for setting values to properties associated with the controls and organized in collections to construct a dialog for obtaining information pertaining to a plurality of topics, each collection of controls configured to create a separate dialog associated with a separate topic;

a recognition server, coupled to the web server, for providing speech recognition processing to received voice data based on a grammar or language model provided with the received voice data to produce speech recognition results, the speech recognition results being provided to the web server; and

a telephone voice browser, coupled to the web server, for processing voice data, the telephone voice browser including a media server for providing a telephony interface and a voice browser;

wherein the controls of the web server include companion controls associated with corresponding primary controls for providing recognition and audible prompting, the companion controls including a semantic map, wherein the semantic map includes semantic items and forms an association between a visual domain of the primary controls and a non-visual recognition domain of the companion controls and wherein the dialog includes at least one question provided by a prompt object and at least one answer, a grammar object is provided to define a grammar for recognition of input data and related processing on the input and an answer property associates a recognized result with a semantic item in the semantic map.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Controls are provided for a web server to generate client side markups that include recognition and/or audible prompting. The controls are organized in collections to obtain information pertaining to different topics. Each collection of controls create a separate dialog. In this manner, the collections can be selectively specified to execute the corresponding dialog.

Citations

18 Claims

1. A client-server system for providing assisted multi-modal dialogue, comprising:
- a web server for generating client-side markups having recognition and audible prompting for execution on a client having recognition capabilities, the web server further including controls for generating the client side markups, the controls including speech controls inheritance for setting values to properties associated with the controls and organized in collections to construct a dialog for obtaining information pertaining to a plurality of topics, each collection of controls configured to create a separate dialog associated with a separate topic;
  
  a recognition server, coupled to the web server, for providing speech recognition processing to received voice data based on a grammar or language model provided with the received voice data to produce speech recognition results, the speech recognition results being provided to the web server; and
  
  a telephone voice browser, coupled to the web server, for processing voice data, the telephone voice browser including a media server for providing a telephony interface and a voice browser;
  
  wherein the controls of the web server include companion controls associated with corresponding primary controls for providing recognition and audible prompting, the companion controls including a semantic map, wherein the semantic map includes semantic items and forms an association between a visual domain of the primary controls and a non-visual recognition domain of the companion controls and wherein the dialog includes at least one question provided by a prompt object and at least one answer, a grammar object is provided to define a grammar for recognition of input data and related processing on the input and an answer property associates a recognized result with a semantic item in the semantic map.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The client-server system of claim 1, wherein the semantic items form a layer between the visual domain of the primary controls and the non-visual recognition domain of the companion controls.
  - 3. The client-server system of claim 1, wherein the web server uses an extension of common markup languages to store both multi-modal and voice information to enable the recognition server to access and process both multi-modal and voice information, and wherein the web server is configured to execute each of the collections as desired in the multi-modal scenario and in a selected order in the voice-only scenario.
  - 4. The client-server system of claim 1, wherein the web server includes an authoring tool for dynamically generating the client-side markups and a specific form of markup for the type of client accessing the web server, the web server further including a library for providing visual, recognition and audible prompting markup information.
  - 5. The client-server system of claim 1, wherein the companion controls are configured to re-use logic and presentation capabilities of the primary controls by synchronizing values with the primary controls, the companion controls including a QA control, a command control, a compare validator control, and a custom validator control, wherein the QA control includes a prompt property that references prompt objects to perform the functions for output control, and wherein the compare validator control compares two values according to an operator and initiates an action based on the comparison.
  - 6. The client-server system of claim 1 further comprising a client device, the client device including:
    - an audio interface including a microphone and a speaker;
      
      a visual interface browser for accessing information from the server;
      
      a rendering device for visually indicating fields for which data is to be entered;
      
      a data store configured to store input; and
      
      a processor, operably coupled to the audio interface and rendering device, wherein the processor is configured to record input speech data associated with each of a set of fields in response to prompts given to a user and to execute the visual interface browser and a module configured to receive information from the web server to create the dialog, wherein the processor, while being executed on the client, is configured to be selectively switched between the execution of the dialog between a voice-only scenario and a multimodal scenario in response to an input from the user during the execution of a dialog.
  - 7. The client-server system of claim 6, wherein the web server associates each item in the set of fields with input received from both the voice-only scenario and the multi-modal scenario, wherein in the voice-only scenario audible prompts are rendered to the user and audible responses are received from the user pursuant to the dialog.
  - 8. The client-server system of claim 7 further comprises a phone coupled to a telephone network and a third party gateway, the third party gateway providing voice signals to the telephone voice browser, wherein the telephone voice browser generates voice recognition from the voice signals transmitted by the phone, the voice recognition being provided from voice browser to recognition server.
  - 9. The client-server system of claim 1, wherein the recognized input is associated with one of the primary controls and when input is received through a graphical user interface, the received input is replicated in a corresponding semantic item and status information for the input is set as being confirmed via a companion control, and wherein a reset is provided by the companion controls to expose semantic items and status information for resetting selected portions of the dialog to remove input associated with semantic items corresponding to the selected portions of the dialog.

10. A computer implemented method for performing recognition and/or audible prompting on a client device in a client/server system, the method comprising:
- generating client-side markups having recognition and audible prompting at a web server for execution on a client having recognition capabilities;
  
  providing controls, organized in collections, for generating the client side markups at the web server, wherein the providing controls includes setting values to properties associated with the controls according to speech controls inheritance, wherein the providing controls for the web server includes providing companion controls associated with corresponding primary controls and including a semantic map to provide recognition and audible prompting;
  
  obtaining information pertaining to a plurality of topics using a dialog constructed from the controls, each collection of controls configured to create a separate dialog associated with a separate topic;
  
  providing speech recognition processing to received voice data at a recognition server based on a grammar or language model provided with the received voice data to produce speech recognition results;
  
  providing the speech recognition results to the web server;
  
  processing voice data at a telephone voice browser;
  
  providing semantic items for the semantic map;
  
  forming an association between a visual domain of the primary controls and a non-visual recognition domain of the companion controls using the semantic items;
  
  providing at least one question for the dialog via a prompt object;
  
  providing at least one answer;
  
  providing a grammar object to define a grammar for recognition of input data and related processing on the input; and
  
  providing an answer property associating a recognized result with a semantic item in the semantic map.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
- - 11. The computer implemented method of claim 10 further comprising:
    - forming a layer between the visual domain of the primary controls and the non-visual recognition domain of the companion controls using the semantic items;
      
      storing both multi-modal and voice information at the web server using an extension of common markup languages to enable the recognition server to access and process both multi-modal and voice information;
      
      executing at the web server each of the collections as desired in the multi-modal scenario and in a selected order in the voice-only scenario; and
      
      dynamically generating the client-side markups using an authoring tool of the web server and a specific form of markup for the type of client accessing the web server.
  - 12. The computer implemented method of claim 10 further comprising re-using, by the companion controls, logic and presentation capabilities of the primary controls by synchronizing values with the primary controls, the companion controls including a QA control, a command control, a compare validator control, and a custom validator control, wherein the QA control references prompt objects using a prompt property to perform the functions for output control.
  - 13. The computer implemented method of claim 12 further comprising:
    - comparing two values by the compare validator according to an operator and initiating an action based on the comparison; and
      
      maintaining a library at the web server for providing visual, recognition and audible prompting markup information.
  - 14. The computer implemented method of claim 10 further comprising providing input using a client device and a phone,wherein the client device includes:
    - an audio interface including a microphone and a speaker;
      
      a visual interface browser for accessing information from the server;
      
      a rendering device for visually indicating fields for which data is to be entered;
      
      a data store configured to store input; and
      
      a processor, operably coupled to the audio interface and rendering device;
      
      wherein the phone is coupled to a telephone network and a third party gateway, the third party gateway providing voice signals to the telephone voice browser; and
      
      wherein the processor records input speech data associated with each of a set of fields in response to prompts given to a user, and executing the visual interface browser and receives information from the web server to create the dialog at processor of the client device.
  - 15. The computer implemented method of claim 10 further comprising switching the execution of the dialog between a voice-only scenario and a multimodal scenario in response to an input from the user during the execution of a dialog.
  - 16. The computer implemented method of claim 10 further comprising associating the recognized input with one of the primary controls and, when input is received through a graphical user interface, replicating the received input in a corresponding semantic item and setting status information for the input as being confirmed via a companion control.
  - 17. The computer implemented method of claim 10 further comprising providing a reset via the companion controls to expose semantic items and status information for resetting selected portions of the dialog to remove input associated with semantic items corresponding to the selected portions of the dialog.

18. A client-server system for providing assisted multi-modal dialogue, comprising:
- a web server for generating client-side markups having recognition and audible prompting for execution on a client having recognition capabilities, the web server also includes controls for generating the client side markups, the controls including speech controls inheritance for setting values to properties associated with the controls and organized in collections to construct a dialog for obtaining information pertaining to a plurality of topics, each collection of controls configured to create a separate dialog associated with a separate topic, the web server further includes an authoring tool for dynamically generating the client-side markups and a specific form of markup for the type of client accessing the web server and a library for providing visual, recognition and audible prompting markup information;
  
  a recognition server, coupled to the web server, for providing speech recognition processing to received voice data based on a grammar or language model provided with the received voice data to produce speech recognition results, the speech recognition results being provided to the web server;
  
  a telephone voice browser, coupled to the web server, for processing voice data, the telephone voice browser including a media server for providing a telephony interface and a voice browser; and
  
  at least one client device for receiving the dialog and providing voice input in response to the dialog;
  
  wherein the controls of the web server include companion controls associated with corresponding primary controls for providing recognition and audible prompting, the companion controls including a semantic map, wherein the semantic map includes semantic items and forms an association between a visual domain of the primary controls and a non-visual recognition domain of the companion controls;
  
  wherein the dialog includes at least one question provided by a prompt object and at least one answer, a grammar object is provided to define a grammar for recognition of input data and related processing on the input and an answer property associates a recognized result with a semantic item in the semantic map; and
  
  wherein the recognized input is associated with one of the primary controls and when input is received through a graphical user interface, the received input is replicated in a corresponding semantic item and status information for the input is set as being confirmed via a companion control, and wherein a reset is provided by the companion controls to expose semantic items and status information for resetting selected portions of the dialog to remove input associated with semantic items corresponding to the selected portions of the dialog.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Lecoeuche, Renaud J.
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
Borsetti, Greg

Application Number

US10/652,685
Publication Number

US 20050091059A1
Time in Patent Office

3,364 Days
Field of Search

704/270.1, 704/200, 704/270, 704/9, 704/70, 704/275
US Class Current

704/270.1
CPC Class Codes

G10L 15/26   Speech to text systems G10L...

H04M 1/72445   for supporting Internet bro...

H04M 2250/74   with voice recognition means

Assisted multi-modal dialogue

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Assisted multi-modal dialogue

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links