STATISTICAL ENHANCEMENT OF SPEECH OUTPUT FROM A STATISTICAL TEXT-TO-SPEECH SYNTHESIS SYSTEM
First Claim
1. A method for enhancement of speech synthesized by a statistical text-to-speech (TTS) system employing a parametric representation of speech in a space of acoustic feature vectors, comprising:
- defining a parametric family of corrective transformations operating in the space of the acoustic feature vectors and dependent on a set of enhancing parameters;
defining a distortion indictor of a feature vector or a plurality of feature vectors;
receiving a feature vector output by the system;
generating an instance of the corrective transformation by;
calculating a reference value of the distortion indicator attributed to a statistical model of the phonetic unit emitting the feature vector;
calculating an actual value of the distortion indicator attributed to feature vectors emitted by the statistical model of the phonetic unit emitting the feature vector;
calculating the enhancing parameter values depending on the reference value of the distortion indicator, the actual value of the distortion indicator and the parametric corrective transformation;
deriving an instance of the corrective transformation corresponding to the enhancing parameter values from the parametric family of the corrective transformations; and
applying the instance of the corrective transformation to the feature vector to provide an enhanced feature vector.
1 Assignment
0 Petitions
Accused Products
Abstract
A method, system and computer program product are provided for enhancement of speech synthesized by a statistical text-to-speech (TTS) system employing a parametric representation of speech in a space of acoustic feature vectors. The method includes: defining a parametric family of corrective transformations operating in the space of the acoustic feature vectors and dependent on a set of enhancing parameters; and defining a distortion indictor of a feature vector or a plurality of feature vectors. The method further includes: receiving a feature vector output by the system; and generating an instance of the corrective transformation by: calculating a reference value of the distortion indicator attributed to a statistical model of the phonetic unit emitting the feature vector; calculating an actual value of the distortion indicator attributed to feature vectors emitted by the statistical model of the phonetic unit emitting the feature vector; calculating the enhancing parameter values depending on the reference value of the distortion indicator, the actual value of the distortion indicator and the parametric corrective transformation; and deriving an instance of the corrective transformation corresponding to the enhancing parameter values from the parametric family of the corrective transformations. The instance of the corrective transformation may be applied to the feature vector to provide an enhanced feature vector.
207 Citations
25 Claims
-
1. A method for enhancement of speech synthesized by a statistical text-to-speech (TTS) system employing a parametric representation of speech in a space of acoustic feature vectors, comprising:
-
defining a parametric family of corrective transformations operating in the space of the acoustic feature vectors and dependent on a set of enhancing parameters; defining a distortion indictor of a feature vector or a plurality of feature vectors; receiving a feature vector output by the system; generating an instance of the corrective transformation by; calculating a reference value of the distortion indicator attributed to a statistical model of the phonetic unit emitting the feature vector; calculating an actual value of the distortion indicator attributed to feature vectors emitted by the statistical model of the phonetic unit emitting the feature vector; calculating the enhancing parameter values depending on the reference value of the distortion indicator, the actual value of the distortion indicator and the parametric corrective transformation; deriving an instance of the corrective transformation corresponding to the enhancing parameter values from the parametric family of the corrective transformations; and applying the instance of the corrective transformation to the feature vector to provide an enhanced feature vector. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A computer program product for enhancement of speech synthesized by a statistical text-to-speech (TTS) system employing a parametric representation of speech in a space of acoustic feature vectors, the computer program product comprising:
-
a computer readable non-transitory storage medium having computer readable program code embodied therewith, the computer readable program code comprising; computer readable program code configured to; define a parametric family of corrective transformations operating in the space of the acoustic feature vectors and dependent on a set of enhancing parameters; define a distortion indictor of a feature vector or a plurality of feature vectors; receive a feature vector output by the system; generate an instance of the corrective transformation by; calculating a reference value of the distortion indicator attributed to a statistical model of the phonetic unit emitting the feature vector; calculating an actual value of the distortion indicator attributed to feature vectors emitted by the statistical model of the phonetic unit emitting the feature vector; calculating the enhancing parameter values depending on the reference value of the distortion indicator, the actual value of the distortion indicator and the parametric corrective transformation; deriving an instance of the corrective transformation corresponding to the enhancing parameter values from the parametric family of the corrective transformations; and applying the instance of the corrective transformation to the feature vector to provide an enhanced feature vector.
-
-
16. A system for enhancement of speech synthesized by a statistical text-to-speech (TTS) system employing a parametric representation of speech in a space of acoustic feature vectors, comprising:
-
a processor; an acoustic feature vector input component for receiving an acoustic feature vector emitted by a phonetic unit; a corrective transformation defining component for defining a parametric family of corrective transformations operating in the space of the acoustic feature vectors and dependent on a set of enhancing parameters; an enhancing parametric set component including; a distortion indicator reference component for calculating a reference value of a distortion indicator attributed to a statistical model of the phonetic unit emitting the feature vector; a distortion indicator actual value component for calculating an actual value of the distortion indicator attributed to feature vectors emitted by the statistical model of the phonetic unit emitting the feature vector; and wherein the enhancing parameter set component calculating the enhancing parameter values depending on the reference value of the distortion indicator, the actual value of the distortion indicator and the parametric corrective transformation; a corrective transformation applying component for applying an instance of the corrective transformation to the feature vector to provide an enhanced feature vector. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25)
-
Specification