SYSTEM AND METHOD FOR UNIFIED NORMALIZATION IN TEXT-TO-SPEECH AND AUTOMATIC SPEECH RECOGNITION
First Claim
1. A method comprising:
- receiving input;
normalizing the input, to yield normalized input;
generating, using the normalized input and a dictionary configured for both automatic speech recognition and text-to-speech processing, output comprising one of phonemes corresponding to the input and text corresponding to the input;
when the output comprises the phonemes corresponding to the input, generating speech by performing prosody generation and unit selection synthesis using the phonemes; and
when the output comprises the text corresponding to the input, training both an acoustic model and a language model for use in future speech recognition.
1 Assignment
0 Petitions
Accused Products
Abstract
A system, method and computer-readable storage devices are for using a single set of normalization protocols and a single language lexica (or dictionary) for both TTS and ASR. The system receives input (which is either text to be converted to speech or ASR training text), then normalizes the input. The system produces, using the normalized input and a dictionary configured for both automatic speech recognition and text-to-speech processing, output which is either phonemes corresponding to the input or text corresponding to the input for training the ASR system. When the output is phonemes corresponding to the input, the system generates speech by performing prosody generation and unit selection synthesis using the phonemes. When the output is text corresponding to the input, the system trains both an acoustic model and a language model for use in future speech recognition.
28 Citations
20 Claims
-
1. A method comprising:
-
receiving input; normalizing the input, to yield normalized input; generating, using the normalized input and a dictionary configured for both automatic speech recognition and text-to-speech processing, output comprising one of phonemes corresponding to the input and text corresponding to the input; when the output comprises the phonemes corresponding to the input, generating speech by performing prosody generation and unit selection synthesis using the phonemes; and when the output comprises the text corresponding to the input, training both an acoustic model and a language model for use in future speech recognition. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
-
a processor; and a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising; receiving input; normalizing the input, to yield normalized input; generating, using the normalized input and a dictionary configured for both automatic speech recognition and text-to-speech processing, output comprising one of phonemes corresponding to the input and text corresponding to the input; when the output comprises the phonemes corresponding to the input, generating speech by performing prosody generation and unit selection synthesis using the phonemes; and when the output comprises the text corresponding to the input, training both an acoustic model and a language model for use in future speech recognition. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising:
-
receiving input; normalizing the input, to yield normalized input; generating, using the normalized input and a dictionary configured for both automatic speech recognition and text-to-speech processing, output comprising one of phonemes corresponding to the input and text corresponding to the input; when the output comprises the phonemes corresponding to the input, generating speech by performing prosody generation and unit selection synthesis using the phonemes; and when the output comprises the text corresponding to the input, training both an acoustic model and a language model for use in future speech recognition. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification