Statistical enhancement of speech output from a statistical text-to-speech synthesis system
First Claim
1. A method for enhancement of speech synthesized by a statistical text-to-speech (TTS) system employing a parametric representation of short-time spectral envelope of speech in a space of acoustic feature vectors, comprising:
- defining a parametric family of corrective transformations operating in the space of the acoustic feature vectors and dependent on a set of enhancing parameters, wherein number of the enhancing parameters in the set of enhancing parameters is less than a dimension of the space of the acoustic feature vectors;
defining a distortion indicator of a feature vector or a plurality of feature vectors, wherein the distortion indicator is not modelled directly by the statistical TTS system;
receiving a feature vector output by the system;
generating an instance of the corrective transformation by;
calculating a reference value of the distortion indicator attributed to a statistical model of the phonetic unit emitting the feature vector;
calculating an actual value of the distortion indicator attributed to feature vectors emitted by the statistical model of the phonetic unit emitting the feature vector;
calculating the enhancing parameter values depending on the reference value of the distortion indicator, the actual value of the distortion indicator and the parametric corrective transformation;
deriving an instance of the corrective transformation corresponding to the enhancing parameter values from the parametric family of the corrective transformations; and
applying the instance of the corrective transformation to the feature vector to provide an enhanced feature vector.
1 Assignment
0 Petitions
Accused Products
Abstract
A method, system and computer program product are provided for enhancement of speech synthesized by a statistical text-to-speech (TTS) system employing a parametric representation of speech in a space of acoustic feature vectors. The method includes: defining a parametric family of corrective transformations operating in the space of the acoustic feature vectors and dependent on a set of enhancing parameters; and defining a distortion indictor of a feature vector or a plurality of feature vectors. The method further includes: receiving a feature vector output by the system; and generating an instance of the corrective transformation by: calculating a reference value of the distortion indicator attributed to a statistical model of the phonetic unit emitting the feature vector; calculating an actual value of the distortion indicator attributed to feature vectors emitted by the statistical model of the phonetic unit emitting the feature vector; calculating the enhancing parameter values depending on the reference value of the distortion indicator, the actual value of the distortion indicator and the parametric corrective transformation; and deriving an instance of the corrective transformation corresponding to the enhancing parameter values from the parametric family of the corrective transformations. The instance of the corrective transformation may be applied to the feature vector to provide an enhanced feature vector.
-
Citations
25 Claims
-
1. A method for enhancement of speech synthesized by a statistical text-to-speech (TTS) system employing a parametric representation of short-time spectral envelope of speech in a space of acoustic feature vectors, comprising:
-
defining a parametric family of corrective transformations operating in the space of the acoustic feature vectors and dependent on a set of enhancing parameters, wherein number of the enhancing parameters in the set of enhancing parameters is less than a dimension of the space of the acoustic feature vectors; defining a distortion indicator of a feature vector or a plurality of feature vectors, wherein the distortion indicator is not modelled directly by the statistical TTS system; receiving a feature vector output by the system; generating an instance of the corrective transformation by; calculating a reference value of the distortion indicator attributed to a statistical model of the phonetic unit emitting the feature vector; calculating an actual value of the distortion indicator attributed to feature vectors emitted by the statistical model of the phonetic unit emitting the feature vector; calculating the enhancing parameter values depending on the reference value of the distortion indicator, the actual value of the distortion indicator and the parametric corrective transformation; deriving an instance of the corrective transformation corresponding to the enhancing parameter values from the parametric family of the corrective transformations; and applying the instance of the corrective transformation to the feature vector to provide an enhanced feature vector. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A computer program product for enhancement of speech synthesized by a statistical text-to-speech (TTS) system employing a parametric representation of short-time spectral envelope of speech in a space of acoustic feature vectors, the computer program product comprising:
-
a computer readable non-transitory storage medium having computer readable program code embodied therewith, the computer readable program code comprising; computer readable program code configured to; define a parametric family of corrective transformations operating in the space of the acoustic feature vectors and dependent on a set of enhancing parameters, wherein number of the enhancing parameters in the set of enhancing parameters is less than a dimension of the space of the acoustic feature vectors; define a distortion indicator of a feature vector or a plurality of feature vectors, wherein the distortion indicator is not modelled directly by the statistical TTS system; receive a feature vector output by the system; generate an instance of the corrective transformation by; calculating a reference value of the distortion indicator attributed to a statistical model of the phonetic unit emitting the feature vector; calculating an actual value of the distortion indicator attributed to feature vectors emitted by the statistical model of the phonetic unit emitting the feature vector; calculating the enhancing parameter values depending on the reference value of the distortion indicator, the actual value of the distortion indicator and the parametric corrective transformation; deriving an instance of the corrective transformation corresponding to the enhancing parameter values from the parametric family of the corrective transformations; and applying the instance of the corrective transformation to the feature vector to provide an enhanced feature vector.
-
-
16. A system for enhancement of speech synthesized by a statistical text-to-speech (TTS) system employing a parametric representation of short-time spectral envelope of speech in a space of acoustic feature vectors, comprising:
-
a processor; an acoustic feature vector input component for receiving an acoustic feature vector emitted by a phonetic unit; a corrective transformation defining component for defining a parametric family of corrective transformations operating in the space of the acoustic feature vectors and dependent on a set of enhancing parameters, wherein number of the enhancing parameters in the set of enhancing parameters is less than a dimension of the space of the acoustic feature vectors; an enhancing parametric set component including; a distortion indicator reference component for calculating a reference value of a distortion indicator attributed to a statistical model of the phonetic unit emitting the feature vector; a distortion indicator actual value component for calculating an actual value of the distortion indicator attributed to feature vectors emitted by the statistical model of the phonetic unit emitting the feature vector, wherein the distortion indicator is not modelled directly by the statistical TTS system; and wherein the enhancing parameter set component calculating the enhancing parameter values depending on the reference value of the distortion indicator, the actual value of the distortion indicator and the parametric corrective transformation; a corrective transformation applying component for applying an instance of the corrective transformation to the feature vector to provide an enhanced feature vector. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25)
-
Specification