Easy generation and automatic training of spoken dialog systems using text-to-speech

US 20060206332A1
Filed: 06/29/2005
Published: 09/14/2006
Est. Priority Date: 03/08/2005
Status: Active Grant

First Claim

Patent Images

1. A dialog system training environment comprising:

a user simulator that provides a text to speech output associated with an utterance; and

, a dialog system that comprises;

a speech model having a plurality of modifiable parameters, the speech model receives the speech input from the utterance and produces output features; and

, a dialog action model having a plurality of modifiable parameters, the dialog model receives the speech output features from the speech model and produces output actions. parameters of the speech model and/or the dialog action model based, at least in part, upon the utterance identified by the speech model and/or the action taken by the dialog action model.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A dialog system training environment and method using text-to-speech (TTS) are provided. The only knowledge a designer requires is a simple specification of when the dialog system has failed or succeeded, and for any state of the dialog, a list of the possible actions the system can take. The training environment simulates a user using TTS varied at adjustable levels, a dialog action model of a dialog system responds to the produced utterance by trying out all possible actions until it has failed or succeeded. From the data accumulated in the training environment it is possible for the dialog action model to learn which states to go to when it observes the appropriate speech and dialog features so as to increase the likelihood of success. The data can also be used to improve the speech model.

125 Citations

View as Search Results

20 Claims

1. A dialog system training environment comprising:
- a user simulator that provides a text to speech output associated with an utterance; and
  
  , a dialog system that comprises;
  
  a speech model having a plurality of modifiable parameters, the speech model receives the speech input from the utterance and produces output features; and
  
  , a dialog action model having a plurality of modifiable parameters, the dialog model receives the speech output features from the speech model and produces output actions. parameters of the speech model and/or the dialog action model based, at least in part, upon the utterance identified by the speech model and/or the action taken by the dialog action model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. A speech model trained by the dialog system training environment of claim 1.
  - 3. A dialog action model trained by the dialog system training environment of claim 1.
  - 4. The environment of claim 1 employed repeatedly to train the speech model via supervised learning.
  - 5. The environment of claim 1 employed repeatedly to train the dialog action model via supervised learning.
  - 6. The environment of claim 1 employed repeatedly to train the speech model and the dialog action model via supervised learning.
  - 7. A dialog action model trained by the environment of claim 1, the dialog action model modified over data collected from a training phase after the training phase has been completed.
  - 8. The environment of claim 1 where the dialog action model provides a probability distribution associated with uncertainty regarding the modifiable parameters of the dialog action model.
  - 9. The environment of claim 1 with the user simulator comprising a speech generator that provides the text to speech output, the speech generator can adjust at least one of the voice, pitch, rate and volume settings of the output.
  - 10. The environment of claim 1 where the user simulator simulates a noisy environment.
  - 11. The environment of claim 1 with the user simulator comprising a language model that stores information associated with utterances for which the dialog action model is being trained to recognize.
  - 12. A speech model trained by the environment of claim 1, the speech model modified over data collected from a training phase after the training phase has been completed.

13. A method of training a learning system either offline or online comprising:
- generating an utterance using text to speech;
  
  identifying the utterance using a speech model;
  
  performing an action; and
  
  , updating a dialog action model or speech model or both.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The method of claim 13 performed iteratively in order to train the speech model and/or dialog action model.
  - 15. The method of claim 13, qualities of the generated utterance varying between iterations.
  - 16. The method of claim 15, the qualities varied comprising at least one of the voice, pitch, rate and volume settings.
  - 17. A computer readable medium having stored thereon computer executable instructions for carrying out the method of claim 13.
  - 18. A computer readable medium having stored thereon computer executable instructions for a dialog system trained by the method of claim 13.

19. A dialog system training environment comprising:
- means for simulating an utterance;
  
  means for identifying the utterance;
  
  means for modeling speech using a plurality of modifiable parameters, the means for modeling speech receiving the utterance and producing output features;
  
  means for modeling dialog actions using a plurality of modifiable parameters, the means for modeling dialog actions receiving the speech output features and producing output actions; and
  
  , means for modifying parameters of the means for modeling speech and/or the means for modeling dialog model actions based, at least in part, upon the utterance identified by the means for modeling speech and/or the action taken by the means for modeling dialog actions.
- View Dependent Claims (20)
- - 20. The environment of claim 19 performed iteratively during a training session, a different voice employed by the means for simulating an utterance for a particular iteration.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Paek, Timothy S., Chickering, David M.

Granted Patent

US 7,885,817 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/257
CPC Class Codes

G10L 13/00   Speech synthesis; Text to s...

G10L 15/063   Training

G10L 15/22   Procedures used during a sp...

Easy generation and automatic training of spoken dialog systems using text-to-speech

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

125 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Easy generation and automatic training of spoken dialog systems using text-to-speech

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

125 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links