Easy generation and automatic training of spoken dialog systems using text-to-speech
First Claim
1. A computer-implemented dialog system training environment, comprising:
- a processor to execute components of a dialog system;
a memory coupled to the processor;
a user simulator that during the dialog system training provides at least one text to speech training output associated with an utterance, the output having variable qualities; and
a dialog system that comprises;
a speech model having a plurality of modifiable speech model parameters, the speech model receives the at least one text to speech training output as a speech model input related to the utterance and produces related speech model output features;
a dialog action model having a plurality of modifiable dialog action model parameters, the dialog action model receives the related speech model output features from the speech model and produces related output actions, the plurality of modifiable speech model parameters, the plurality of modifiable dialog action model parameters, or a combination thereof, are based, at least in part, upon the utterance, the action taken by the dialog action model, or a combination thereof; and
the dialog system identifies the utterance that is in need of clarification by initiating a repair dialog, wherein the utterance associated with the repair dialog is identified includes;
determining what states of the repair dialog are reached from other states, the dialog system learns which states to go to when observing an appropriate speech and dialog features by trying all repair paths using the user simulator where a user'"'"'s voice is generated using various text-to-speech (TTS) engines at adjustable levels; and
determining which states of the repair dialog are failures or successes.
2 Assignments
0 Petitions
Accused Products
Abstract
A dialog system training environment and method using text-to-speech (TTS) are provided. The only knowledge a designer requires is a simple specification of when the dialog system has failed or succeeded, and for any state of the dialog, a list of the possible actions the system can take.
The training environment simulates a user using TTS varied at adjustable levels, a dialog action model of a dialog system responds to the produced utterance by trying out all possible actions until it has failed or succeeded. From the data accumulated in the training environment it is possible for the dialog action model to learn which states to go to when it observes the appropriate speech and dialog features so as to increase the likelihood of success. The data can also be used to improve the speech model.
-
Citations
18 Claims
-
1. A computer-implemented dialog system training environment, comprising:
-
a processor to execute components of a dialog system; a memory coupled to the processor; a user simulator that during the dialog system training provides at least one text to speech training output associated with an utterance, the output having variable qualities; and a dialog system that comprises; a speech model having a plurality of modifiable speech model parameters, the speech model receives the at least one text to speech training output as a speech model input related to the utterance and produces related speech model output features; a dialog action model having a plurality of modifiable dialog action model parameters, the dialog action model receives the related speech model output features from the speech model and produces related output actions, the plurality of modifiable speech model parameters, the plurality of modifiable dialog action model parameters, or a combination thereof, are based, at least in part, upon the utterance, the action taken by the dialog action model, or a combination thereof; and the dialog system identifies the utterance that is in need of clarification by initiating a repair dialog, wherein the utterance associated with the repair dialog is identified includes; determining what states of the repair dialog are reached from other states, the dialog system learns which states to go to when observing an appropriate speech and dialog features by trying all repair paths using the user simulator where a user'"'"'s voice is generated using various text-to-speech (TTS) engines at adjustable levels; and determining which states of the repair dialog are failures or successes. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method of training a speech recognition learning system either offline or online, the method comprising:
-
generating at least one text to speech output to an utterance using a user simulator during the training, the output having qualities comprising at least one of a voice, a pitch, a rate or a volume; identifying speech features related to the utterance using a speech model or using a dialog action model; identifying the utterance that is in need of clarification by initiating a repair dialog, wherein the utterance associated with the repair dialog is identified includes; determining what states of the repair dialog are reached from other states, the dialog system learns which states to go to when observing an appropriate speech and dialog features by trying all repair paths using the user simulator where a user'"'"'s voice is generated using various text-to-speech (TTS) engines at adjustable levels; and determining which states of the repair dialog are failures or successes; performing an action related to the speech features; and updating the dialog action model, the speech model or both, based at least in part on the speech features, the action, or a combination thereof. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A computer-implemented dialog training system, comprising:
-
a processing unit to implement the dialog training system; a user simulator generating a simulated training data set for the dialog training system, the simulating training data set comprising text to speech training output associated with one or more simulated utterances varying over at least one of a pitch, a rate, a volume, a noise, or combinations thereof; a speech model having a plurality of modifiable speech model parameters relating to the one or more simulated utterances, the speech model receiving the simulated training data set as a speech model input and producing related speech model output features relating to the one or more simulated utterances; a dialog action model having a plurality of modifiable dialog action model parameters relating to the one or more simulated utterances, the dialog action model receiving the related speech model output features and producing corresponding output actions relating to the one or more simulated utterances; and the dialog training system further receives one or more simulated utterances that is in need of clarification by initiating a repair dialog, wherein the utterance associated with the repair dialog is identified includes; determining what states of the repair dialog are reached from other states, the dialog system learns which states to go to when observing an appropriate speech and dialog features by trying all repair paths using the user simulator where a user'"'"'s voice is generated using various text-to-speech (TTS) engines at adjustable levels; and determining which states of the repair dialog are failures or successes. - View Dependent Claims (15, 16, 17, 18)
-
Specification