SYSTEM AND METHOD FOR AUTOMATIC DETECTION OF ABNORMAL STRESS PATTERNS IN UNIT SELECTION SYNTHESIS
First Claim
1. A method comprising:
- receiving a stress pattern for both a language and an accent in the language;
detecting, based on the stress pattern, incorrect stress patterns in selected acoustic units representing speech to be synthesized, wherein the selected acoustic units were selected by a separate unit-selection speech synthesizer;
performing an analysis of the incorrect stress patterns, wherein the analysis comprises a word level analysis, a phrase level analysis, and a sentence level analysis on the incorrect stress patterns; and
modifying, via a processor and prior to waveform synthesis, the incorrect stress patterns in the selected acoustic units based on the analysis, to yield corrected stress patterns, wherein the corrected stress patterns conform to the stress pattern for the language.
8 Assignments
0 Petitions
Accused Products
Abstract
Disclosed herein are systems, methods, and non-transitory computer-readable storage media for detecting and correcting abnormal stress patterns in unit-selection speech synthesis. A system practicing the method detects incorrect stress patterns in selected acoustic units representing speech to be synthesized, and corrects the incorrect stress patterns in the selected acoustic units to yield corrected stress patterns. The system can further synthesize speech based on the corrected stress patterns. In one aspect, the system also classifies the incorrect stress patterns using a machine learning algorithm such as a classification and regression tree, adaptive boosting, support vector machine, and maximum entropy. In this way a text-to-speech unit selection speech synthesizer can produce more natural sounding speech with suitable stress patterns regardless of the stress of units in a unit selection database.
-
Citations
20 Claims
-
1. A method comprising:
-
receiving a stress pattern for both a language and an accent in the language; detecting, based on the stress pattern, incorrect stress patterns in selected acoustic units representing speech to be synthesized, wherein the selected acoustic units were selected by a separate unit-selection speech synthesizer; performing an analysis of the incorrect stress patterns, wherein the analysis comprises a word level analysis, a phrase level analysis, and a sentence level analysis on the incorrect stress patterns; and modifying, via a processor and prior to waveform synthesis, the incorrect stress patterns in the selected acoustic units based on the analysis, to yield corrected stress patterns, wherein the corrected stress patterns conform to the stress pattern for the language. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system comprising:
-
a processor; and a computer-readable storage medium having instructions stored which, when executed by the processor, result in the processor performing operations comprising; receiving a stress pattern for both a language and an accent in the language; detecting, based on the stress pattern, incorrect stress patterns in selected acoustic units representing speech to be synthesized, wherein the selected acoustic units were selected by a separate unit-selection speech synthesizer; performing an analysis of the incorrect stress patterns, wherein the analysis comprises a word level analysis, a phrase level analysis, and a sentence level analysis on the incorrect stress patterns; and modifying, via a processor and prior to waveform synthesis, the incorrect stress patterns in the selected acoustic units based on the analysis, to yield corrected stress patterns, wherein the corrected stress patterns conform to the stress pattern for the language. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. A computer-readable storage device having instructions stored which, when executed by a computing device, result in the computing device performing operations comprising:
-
receiving a stress pattern for both a language and an accent in the language; detecting, based on the stress pattern, incorrect stress patterns in selected acoustic units representing speech to be synthesized, wherein the selected acoustic units were selected by a separate unit-selection speech synthesizer; performing an analysis of the incorrect stress patterns, wherein the analysis comprises a word level analysis, a phrase level analysis, and a sentence level analysis on the incorrect stress patterns; and modifying, via a processor and prior to waveform synthesis, the incorrect stress patterns in the selected acoustic units based on the analysis, to yield corrected stress patterns, wherein the corrected stress patterns conform to the stress pattern for the language. - View Dependent Claims (18, 19, 20)
-
Specification