Feature-domain concatenative speech synthesis
First Claim
Patent Images
1. A method for speech synthesis, comprising:
- providing a segment inventory comprising, for a plurality of speech segments, respective sequences of feature vectors, by estimating spectral envelopes of input speech signals corresponding to the speech segments in a succession of time intervals during each of the speech segments, and integrating the spectral envelopes over a plurality of window functions in a frequency domain so as to determine vector elements of the feature vectors;
receiving phonetic and prosodic information indicative of an output speech signal to be generated;
selecting the sequences of feature vectors from the inventory responsive to the phonetic and prosodic information;
processing the selected sequences of feature vectors so as to generate a concatenated output series of feature vectors;
computing a series of complex line spectra of the output signal from the series of the feature vectors; and
transforming the complex line spectra to a time domain speech signal for output.
7 Assignments
0 Petitions
Accused Products
Abstract
A method for speech synthesis includes receiving an input speech signal containing a set of speech segments, and estimating spectral envelopes of the input speech signal in a succession of time intervals during each of the speech segments. The spectral envelopes are integrated over a plurality of window functions in a frequency domain so as to determine elements of feature vectors corresponding to the speech segments. An output speech signal is reconstructed by concatenating the feature vectors corresponding to a sequence of the speech segments.
-
Citations
75 Claims
-
1. A method for speech synthesis, comprising:
-
providing a segment inventory comprising, for a plurality of speech segments, respective sequences of feature vectors, by estimating spectral envelopes of input speech signals corresponding to the speech segments in a succession of time intervals during each of the speech segments, and integrating the spectral envelopes over a plurality of window functions in a frequency domain so as to determine vector elements of the feature vectors;
receiving phonetic and prosodic information indicative of an output speech signal to be generated;
selecting the sequences of feature vectors from the inventory responsive to the phonetic and prosodic information;
processing the selected sequences of feature vectors so as to generate a concatenated output series of feature vectors;
computing a series of complex line spectra of the output signal from the series of the feature vectors; and
transforming the complex line spectra to a time domain speech signal for output. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 48, 49, 50)
-
-
16. A method for speech synthesis, comprising:
-
receiving an input speech signal containing a set of speech segments;
estimating spectral envelopes of the input speech signal in a succession of time intervals during each of the speech segments;
integrating the spectral envelopes over a plurality of window functions in a frequency domain so as to determine elements of feature vectors corresponding to the speech segments; and
reconstructing an output speech signal by concatenating the feature vectors corresponding to a sequence of the speech segments. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
26. A device for speech synthesis, comprising:
-
a memory, arranged to hold a segment inventory comprising, for a plurality of speech segments, respective sequences of feature vectors having vector elements determined by estimating spectral envelopes of input speech signals corresponding to the speech segments in a succession of time intervals during each of the speech segments, and integrating the spectral envelopes over a plurality of window functions in a frequency domain; and
a speech processor, arranged to receive phonetic and prosodic information indicative of an output speech signal to be generated, to select the sequences of feature vectors from the inventory responsive to the phonetic and prosodic information, to process the selected sequences of feature vectors so as to generate a concatenated output series of feature vectors, and to compute a series of complex line spectra of the output signal from the series of the feature vectors and transform the complex line spectra to a time domain speech signal for output. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40)
-
-
41. A device for speech synthesis, comprising:
-
a memory, arranged to hold a segment inventory determined by processing an input speech signal containing a set of speech segments so as to estimate spectral envelopes of the input speech signal in a succession of time intervals during each of the speech segments, and integrating the spectral envelopes over a plurality of window functions in a frequency domain so as to determine elements of feature vectors corresponding to the speech segments; and
a speech processor, arranged to reconstruct an output speech signal by concatenating the feature vectors corresponding to a sequence of the speech segments. - View Dependent Claims (42, 43, 44, 45, 46, 47)
-
- 51. A computer software product, comprising a computer-readable medium in which program instructions are stored, which instructions, when read by a computer, cause the computer to access a segment inventory comprising, for a plurality of speech segments, respective sequences of feature vectors having vector elements determined by estimating spectral envelopes of input speech signals corresponding to the speech segments in a succession of time intervals during each of the speech segments, and integrating the spectral envelopes over a plurality of window functions in a frequency domain, and in response to phonetic and prosodic information indicative of an output speech signal to be generated, cause the computer to select the sequences of feature vectors from the inventory responsive to the phonetic and prosodic information, to process the selected sequences of feature vectors so as to generate a concatenated output series of feature vectors, and to compute a series of complex line spectra of the output signal from the series of the feature vectors and transform the complex line spectra to a time domain speech signal for output.
- 66. A computer software product, comprising a computer-readable medium in which a segment inventory is stored, the inventory having been determined by processing an input speech signal containing a set of speech segments so as to estimate spectral envelopes of the input speech signal in a succession of time intervals during each of the speech segments, and integrating the spectral envelopes over a plurality of window functions in a frequency domain so as to determine elements of feature vectors corresponding to the speech segments, so that a speech processor can reconstruct an output speech signal by concatenating the feature vectors corresponding to a sequence of the speech segments.
Specification