Text-to-speech pre-processing
First Claim
1. A computer-implemented method for text-to-speech (TTS) pre-processing, the method comprising:
- receiving, by a processing device, an automated speech recognition output comprising an n-best list and associated confidence scores;
receiving, by the processing device, a user model from a model database, the user model containing proper names, favorite places, and user-specified vocabulary;
receiving, by the processing device, a context model from the model database, the context model containing text or query history information, location context information, and date and time context information;
performing, by the processing device, a TTS pre-processing on the n-best list and associated confidence scores to generate a read back message, wherein the read back message comprises a read back instruction, wherein performing the TTS pre-processing further comprises aligning n-best list items on the n-best list, parsing the n-best list items, and identifying strong words and weak words using the associated confidence scores, the user model, the context model, parsing results from parsing the n-best list items, and an n-best list alignment including repetitions across the n-best list items; and
sending, by the processing device, the read back message to a TTS speech synthesizer for generating an audible signal based on the read back message to cause an audio device to present the read back message.
1 Assignment
0 Petitions
Accused Products
Abstract
Examples of techniques text-to-speech pre-processing for speech recognition and speech synthesis are disclosed. In one example implementation, a computer-implemented method includes receiving, by a processing device, an automated speech recognition output comprising an n-best list and associated confidence scores. The method further includes performing, by the processing device, a TTS pre-processing on the n-best list and associated confidence scores to generate a read back message, wherein the read back message comprises a read back instruction. The method further includes sending, by the processing device, the read back message to a TTS speech synthesizer for generating an audible signal based on the read back message to cause an audio device to present the read back message.
11 Citations
15 Claims
-
1. A computer-implemented method for text-to-speech (TTS) pre-processing, the method comprising:
-
receiving, by a processing device, an automated speech recognition output comprising an n-best list and associated confidence scores; receiving, by the processing device, a user model from a model database, the user model containing proper names, favorite places, and user-specified vocabulary; receiving, by the processing device, a context model from the model database, the context model containing text or query history information, location context information, and date and time context information; performing, by the processing device, a TTS pre-processing on the n-best list and associated confidence scores to generate a read back message, wherein the read back message comprises a read back instruction, wherein performing the TTS pre-processing further comprises aligning n-best list items on the n-best list, parsing the n-best list items, and identifying strong words and weak words using the associated confidence scores, the user model, the context model, parsing results from parsing the n-best list items, and an n-best list alignment including repetitions across the n-best list items; and sending, by the processing device, the read back message to a TTS speech synthesizer for generating an audible signal based on the read back message to cause an audio device to present the read back message. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for text-to-speech (TTS) pre-processing, the system comprising:
-
a memory comprising computer readable instructions; and a processing device for executing the computer readable instructions for performing a method, the method comprising; receiving, by the processing device, an automated speech recognition output comprising an n-best list and associated confidence scores; receiving, by the processing device, a user model from a model database, the user model containing proper names, favorite places, and user-specified vocabulary; receiving, by the processing device, a context model from the model database, the context model containing text or query history information, location context information, and date and time context information; performing, by the processing device, the TTS pre-processing on the n-best list and associated confidence scores to generate a read back message, wherein the read back message comprises a read back instruction, wherein performing the TTS pre-processing further comprises aligning n-best list items on the n-best list, parsing the n-best list items, and identifying strong words and weak words using the associated confidence scores, the user model, the context model, parsing results from parsing the n-best list items, and an n-best list alignment including repetitions across the n-best list item; and sending, by the processing device, the read back message to a TTS speech synthesizer for generating an audible signal based on the read back message to cause an audio device to present the read back message. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. A computer program product for text-to-speech (TTS) pre-processing, the computer program product comprising:
a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, the program instructions executable by a processing device to cause the processing device to perform a method comprising; receiving, by the processing device, an automated speech recognition output comprising an n-best list and associated confidence scores; performing, by the processing device, the TTS pre-processing on the n-best list and associated confidence scores to generate a read back message, wherein the read back message comprises a read back instruction; and sending, by the processing device, the read back message to a TTS speech synthesizer for generating an audible signal based on the read back message to cause an audio device to present the read back message, wherein the read back instruction comprises a pause instruction indicating a length of time of a pause, an enunciation instruction presented as bold text that represents text to be read back with more enunciation than non-bold text, an intonation instruction as a visual indicator, the visual indicator being one of an up arrow or a down arrow, wherein the up arrow denotes an increase in intonation, and wherein the down arrow denotes a decrease in intonation, and a volume instruction, the volume instruction indicating a volume level based on a noise level detected via a microphone in a vehicle, and wherein the volume level is adjusted based on a signal-to-noise ratio.
Specification