Method and apparatus for improved duration modeling of phonemes
First Claim
1. A method for producing synthetic speech comprising the steps of:
- receiving text into a processor;
processing the text using a phoneme duration model, the phoneme duration model produced by developing a non-exponential functional transformation form for use with a generalized additive model; and
generating speech signals representative of the received text.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and an apparatus for improved duration modeling of phonemes in a speech synthesis system are provided. According to one aspect, text is received into a processor of a speech synthesis system. The received text is processed using a sum-of-products phoneme duration model that is used in either the formant method or the concatenative method of speech generation. The phoneme duration model, which is used along with a phoneme pitch model, is produced by developing a non-exponential functional transformation form for use with a generalized additive model. The non-exponential functional transformation form comprises a root sinusoidal transformation that is controlled in response to a minimum phoneme duration and a maximum phoneme duration. The minimum and maximum phoneme durations are observed in training data. The received text is processed by specifying at least one of a number of contextual factors for the generalized additive model. An inverse of the non-exponential functional transformation is applied to duration observations, or training data. Coefficients are generated for use with the generalized additive model. The generalized additive model comprising the coefficients is applied to at least one phoneme of the received text resulting in the generation of at least one phoneme having a duration. An acoustic sequence is generated comprising speech signals that are representative of the received text.
158 Citations
23 Claims
-
1. A method for producing synthetic speech comprising the steps of:
-
receiving text into a processor;
processing the text using a phoneme duration model, the phoneme duration model produced by developing a non-exponential functional transformation form for use with a generalized additive model; and
generating speech signals representative of the received text. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. An apparatus for speech synthesis comprising:
-
an input for receiving text signals into a processor;
a processor configured to synthesize an acoustic sequence using a phoneme duration model, the phoneme duration model produced by developing a non-exponential functional transformation form for use with a generalized additive model; and
an output for providing speech signals representative of the received text. - View Dependent Claims (12, 13, 14, 15, 16, 18, 19)
-
-
17. A speech generation process comprising a phoneme duration model, the phoneme duration model produced by developing a non-exponential functional transformation form for use with a generalized additive model.
-
20. A computer readable medium containing executable instructions which, when executed in a processing system, causes the system to perform the steps for synthesizing speech comprising:
-
receiving text into a processor;
processing the text using a phoneme duration model, the phoneme duration model produced by developing a non-exponential functional transformation form for use with a generalized additive model; and
generating speech signals representative of the received text. - View Dependent Claims (21, 22)
-
-
23. A method for generating a phoneme duration model for use in a speech synthesis system, the method comprising the step of developing a non-exponential functional transformation form for use with a generalized additive model, wherein the non-exponential functional transformation is expressed by
Specification