Inferring switching conditions for switching between modalities in a speech application environment extended for interactive text exchanges

US 8,874,447 B2
Filed: 07/06/2012
Issued: 10/28/2014
Est. Priority Date: 12/19/2006
Status: Active Grant

First Claim

Patent Images

1. A method for allowing multimodal communication with a first device executing a speech-enabled application during a communication session with a user, comprising:

with the first device, at a first time during the communication session, receiving a first signal from a voice server via a communication channel, the first signal corresponding to speech recognition results generated by the voice server of a voice input originating from a voice interface operated by the user during the communication session;

with the first device, processing the first signal with the speech-enabled application during the communication session with the user to generate a first responsive output;

with the first device, at a second time during the communication session with the user, receiving a second signal from the voice server via the communication channel, the second signal corresponding to a text input originating from a text interface operated by the user during the communication session, the voice server having received the text input via a communications network; and

with the first device, processing the second signal with the speech-enabled application during the communication session with the user to generate a second responsive output, and communicating a third signal corresponding to the second responsive output to the text interface via a communication path exclusive of the voice server.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The disclosed solution includes a method for dynamically switching modalities based upon inferred conditions in a dialog session involving a speech application. The method establishes a dialog session between a user and the speech application. During the dialog session, the user interacts using an original modality and a second modality. The speech application interacts using a speech modality only. A set of conditions indicative of interaction problems using the original modality can be inferred. Responsive to the inferring step, the original modality can be changed to the second modality. A modality transition to the second modality can be transparent the speech application and can occur without interrupting the dialog session. The original modality and the second modality can be different modalities; one including a text exchange modality and another including a speech modality.

66 Citations

View as Search Results

17 Claims

1. A method for allowing multimodal communication with a first device executing a speech-enabled application during a communication session with a user, comprising:
- with the first device, at a first time during the communication session, receiving a first signal from a voice server via a communication channel, the first signal corresponding to speech recognition results generated by the voice server of a voice input originating from a voice interface operated by the user during the communication session;
  
  with the first device, processing the first signal with the speech-enabled application during the communication session with the user to generate a first responsive output;
  
  with the first device, at a second time during the communication session with the user, receiving a second signal from the voice server via the communication channel, the second signal corresponding to a text input originating from a text interface operated by the user during the communication session, the voice server having received the text input via a communications network; and
  
  with the first device, processing the second signal with the speech-enabled application during the communication session with the user to generate a second responsive output, and communicating a third signal corresponding to the second responsive output to the text interface via a communication path exclusive of the voice server.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein voice interface and the text interface are elements of a mobile communications device.
  - 3. The method of claim 1, wherein the first device comprises an automated response system.
  - 4. The method of claim 1, wherein:
    - the voice server is capable of communicating with each of the voice interface and the text interface, and the method further comprises;
      
      with the first device, receiving each of the first signal and the second signal from the voice server via the communication channel.
  - 5. The method of claim 4, wherein voice interface and the text interface are elements of a mobile communications device.
  - 6. The method of claim 1, wherein:
    - processing the first signal with the speech-enabled application comprises producing the first responsive output in the form of a voice markup segment; and
      
      processing the second signal with the speech-enabled application comprises producing the second responsive output in the form of a voice markup segment.

7. A system in which a user is able to engage in multiple modes of communication with a speech-enabled application, comprising:
- a first device configured to execute the speech-enabled application so as to engage in a communication session with a user;
  
  a voice interface operable by the user during the communication session; and
  
  a text interface operable by the user during the communication session;
  
  wherein the system is configured such that the first device is able to receive, at a first time during the communication session, a first signal from a voice server via a communication channel, the first signal corresponding to speech recognition results generated by the voice server of a voice input originating from the voice interface, and to process the first signal with the speech-enabled application during the communication session with the user to generate a first responsive output; and
  
  wherein the system is further configured such that the first device is able to receive, at a second time during the communication session with the user, a second signal from the voice server via the communication channel, the second signal corresponding to a text input originating from the text interface operated by the user during the communication session, to process the second signal with the speech-enabled application during the communication session with the user to generate a second responsive output, and to communicate a third signal corresponding to the second responsive output to the text interface via a communication path exclusive of the voice server; and
  
  wherein the system is further configured such that the voice server is able to receive the text input via a communications network.
- View Dependent Claims (8, 9, 10, 11, 12, 14, 15)
- - 8. The system of claim 7, wherein voice interface and the text interface are elements of a mobile communications device.
  - 9. The system of claim 7, wherein the first device comprises an automated response system.
  - 10. The system of claim 7, wherein the voice server is capable of communicating with each of the voice interface and the text interface, and wherein the communications channel is established between the first device and the voice server.
  - 11. The system of claim 10, wherein the voice interface and the text interface are elements of a mobile communications device.
  - 12. The system of claim 7, wherein the speech-enabled application is adapted to process the first signal so as to produce the first responsive output in the form of a voice markup segment, and to process the second signal so as to produce the second responsive output in the form of a voice markup segment.
  - 14. The system of claim 12, further comprising the voice interface and the text interface, and wherein the voice interface and text interface are elements of a mobile communications device.
  - 15. The system of claim 14, wherein the device configured to execute the speech-enabled application comprises an automated response system.

13. A system for allowing multimodal communication during a communication session with a user, comprising:
- a device configured to execute a speech-enabled application; and
  
  a voice server comprising first means for, at a first time during the communication session, communicating a first signal corresponding to speech recognition results generated by the voice server of a voice input from a voice interface to an input of the device executing the speech-enabled application for processing thereby to generate a first responsive output, and second means for, at a second time during the communication session, receiving text input from a text interface via a communications network and communicating a second signal corresponding to the text input to the input of the device executing the speech-enabled application for processing thereby to generate a second responsive output;
  
  wherein the device configured to execute the speech-enabled application is further configured to communicate a third signal corresponding to the second responsive output to the text interface via a communication path exclusive of the voice server.
- View Dependent Claims (16, 17)
- - 16. The system of claim 13, wherein the device configured to execute the speech-enabled application comprises an automated response system.
  - 17. The system of claim 13, wherein the speech-enabled application is adapted to process the first signal so as to produce the first responsive output in the form of a voice markup segment, and to process the second signal so as to produce the second responsive output in the form of a voice markup segment.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Da Palma, William V., Mandalia, Baiju D., Moore, Victor S., Nusbickel, Wendi L.
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
ADESANYA, OLUJIMI A

Application Number

US13/543,198
Publication Number

US 20120271643A1
Time in Patent Office

844 Days
Field of Search

704270-275
US Class Current

704/270.1
CPC Class Codes

G10L 15/22 Procedures used during a sp...

Inferring switching conditions for switching between modalities in a speech application environment extended for interactive text exchanges

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

66 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Inferring switching conditions for switching between modalities in a speech application environment extended for interactive text exchanges

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

66 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links