Speech synthesis apparatus and method
First Claim
1. A speech synthesis apparatus comprising:
- a selecting unit configured to select speaker'"'"'s parameters, of a plurality of speakers, one by one for respective speakers and obtain a plurality of speakers'"'"' parameters, the speaker'"'"'s parameters being prepared for respective pitch waveforms corresponding to speaker'"'"'s speech sounds, the speaker'"'"'s parameters including formant frequencies, formant phases, formant powers, and window functions concerning respective formants that are contained in the respective pitch waveforms;
a mapping unit configured to use a cost function to assess a weighted sum of a difference between the formant frequencies and a difference between the formant powers, to determine formants of the plurality of speakers'"'"' parameters that correspond to each other;
a generating unit configured to generate an interpolated speaker'"'"'s parameter by interpolating, in accordance with desired interpolation ratios, the formant frequencies, formant phases, formant powers, and window functions of the formants of the plurality of speakers'"'"' parameters that correspond to each other; and
a synthesizing unit configured to synthesize a pitch waveform corresponding to interpolated speaker'"'"'s speech sounds based on the interpolation ratios using the interpolated speaker'"'"'s parameter.
2 Assignments
0 Petitions
Accused Products
Abstract
According to an embodiment, a speech synthesis apparatus includes a selecting unit configured to select speaker'"'"'s parameters one by one for respective speakers and obtain a plurality of speakers'"'"' parameters, the speaker'"'"'s parameters being prepared for respective pitch waveforms corresponding to speaker'"'"'s speech sounds, the speaker'"'"'s parameters including formant frequencies, formant phases, formant powers, and window functions concerning respective formants that are contained in the respective pitch waveforms. The apparatus includes a mapping unit configured to make formants correspond to each other between the plurality of speakers'"'"' parameters using a cost function based on the formant frequencies and the formant powers. The apparatus includes a generating unit configured to generate an interpolated speaker'"'"'s parameter by interpolating, at desired interpolation ratios, the formant frequencies, formant phases, formant powers, and window functions of formants which are made to correspond to each other.
20 Citations
13 Claims
-
1. A speech synthesis apparatus comprising:
-
a selecting unit configured to select speaker'"'"'s parameters, of a plurality of speakers, one by one for respective speakers and obtain a plurality of speakers'"'"' parameters, the speaker'"'"'s parameters being prepared for respective pitch waveforms corresponding to speaker'"'"'s speech sounds, the speaker'"'"'s parameters including formant frequencies, formant phases, formant powers, and window functions concerning respective formants that are contained in the respective pitch waveforms; a mapping unit configured to use a cost function to assess a weighted sum of a difference between the formant frequencies and a difference between the formant powers, to determine formants of the plurality of speakers'"'"' parameters that correspond to each other; a generating unit configured to generate an interpolated speaker'"'"'s parameter by interpolating, in accordance with desired interpolation ratios, the formant frequencies, formant phases, formant powers, and window functions of the formants of the plurality of speakers'"'"' parameters that correspond to each other; and a synthesizing unit configured to synthesize a pitch waveform corresponding to interpolated speaker'"'"'s speech sounds based on the interpolation ratios using the interpolated speaker'"'"'s parameter. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A non-transitory computer readable storage medium storing instructions of a computer program which when executed by a computer results in performance of steps comprising:
-
selecting speaker'"'"'s parameters, of a plurality of speakers, one by one for respective speakers and obtaining a plurality of speakers'"'"' parameters, the speaker'"'"'s parameters being prepared for respective pitch waveforms corresponding to speaker'"'"'s speech sounds, the speaker'"'"'s parameters including formant frequencies, formant phases, formant powers, and window functions concerning respective formants that are contained in the respective pitch waveforms; using a cost function to assess a weighted sum of a difference between the formant frequencies and a difference between the formant powers, to determine formants of the plurality of speakers'"'"' parameters that correspond to each other; generating an interpolated speaker'"'"'s parameter by interpolating, at desired interpolation ratios, the formant frequencies, formant phases, formant powers, and window functions of formants of the plurality of speakers'"'"' parameters that correspond to each other; and synthesizing a pitch waveform corresponding to interpolated speaker'"'"'s speech sounds based on the interpolation ratios using the interpolated speaker'"'"'s parameter. - View Dependent Claims (9)
-
-
10. A speech synthesis method comprising:
-
selecting speaker'"'"'s parameters, of a plurality of speakers, one by one for respective speakers and obtaining a plurality of speakers'"'"' parameters, by a selecting unit, the speaker'"'"'s parameters being prepared for respective pitch waveforms corresponding to speaker'"'"'s speech sounds, the speaker'"'"'s parameters including formant frequencies, formant phases, formant powers, and window functions concerning respective formants that are contained in the respective pitch waveforms; using a cost function to assesses a weighted sum of a difference between the formant frequencies and a difference between the formant powers, to determine formants of the plurality of speakers'"'"' parameters that correspond to each other, by a mapping unit; generating an interpolated speaker'"'"'s parameter by interpolating, at desired interpolation ratios, the formant frequencies, formant phases, formant powers, and window functions of formants of the plurality of speakers'"'"' parameters that correspond to each other, by a generating unit; and synthesizing a pitch waveform corresponding to interpolated speaker'"'"'s speech sounds based on the interpolation ratios using the interpolated speaker'"'"'s parameter, by a synthesis unit. - View Dependent Claims (11)
-
-
12. A speech synthesis apparatus comprising:
-
a selecting unit configured to select speaker'"'"'s parameters one by one for respective speakers and obtain a plurality of speakers'"'"' parameters, the speaker'"'"'s parameters being prepared for respective pitch waveforms corresponding to speaker'"'"'s speech sounds, the speaker'"'"'s parameters including formant frequencies, formant phases, formant powers, and window functions concerning respective formants that are contained in the respective pitch waveforms; a mapping unit configured to make formants correspond to each other between the plurality of speakers'"'"' parameters using a cost function based on the formant frequencies and the formant powers; a generating unit configured to generate an interpolated speaker'"'"'s parameter by interpolating, in accordance with desired interpolation ratios, the formant frequencies, formant phases, formant powers, and window functions of the formants which are made to correspond to each other; a synthesizing unit configured to synthesize a pitch waveform corresponding to interpolated speaker'"'"'s speech sounds based on the interpolation ratios using the interpolated speaker'"'"'s parameter; a second selecting unit configured to select, one by one for respective speakers, pitch waveforms corresponding to aperiodic components of the speaker'"'"'s speech sounds and obtain a plurality of pitch waveforms; a second generating unit configured to generate a pitch waveform corresponding to an aperiodic component of the interpolated speaker'"'"'s speech sound by interpolating the plurality of pitch waveforms at the interpolation ratios; and a second synthesizing unit configured to synthesize the pitch waveform corresponding to the periodic component of the interpolated speaker'"'"'s speech sound and the pitch waveform corresponding to the aperiodic component of the interpolated speaker'"'"'s speech sound, and obtain the pitch waveform corresponding to the interpolated speaker'"'"'s speech sound.
-
-
13. A speech synthesis apparatus comprising:
-
a selecting unit configured to select speaker'"'"'s parameters one by one for respective speakers and obtain a plurality of speakers'"'"' parameters, the speaker'"'"'s parameters being prepared for respective pitch waveforms corresponding to speaker'"'"'s speech sounds, the speaker'"'"'s parameters including formant frequencies, formant phases, formant powers, and window functions concerning respective formants that are contained in the respective pitch waveforms; a mapping unit configured to make formants correspond to each other between the plurality of speakers'"'"' parameters using a cost function based on the formant frequencies and the formant powers; a generating unit configured to generate an interpolated speaker'"'"'s parameter by interpolating, in accordance with desired interpolation ratios, the formant frequencies, formant phases, formant powers, and window functions of the formants which are made to correspond to each other; a synthesizing unit configured to synthesize a pitch waveform corresponding to interpolated speaker'"'"'s speech sounds based on the interpolation ratios using the interpolated speaker'"'"'s parameter; a second generating unit configured to generate a pitch waveform corresponding to a target speaker'"'"'s speech sound; and a calculating unit configured to calculate an optimum interpolation ratio for obtaining the target speaker'"'"'s speech sound based on the plurality of speakers'"'"' parameters, by performing, for the interpolation ratios, feedback control of making the pitch waveform corresponding to the interpolated speaker'"'"'s speech sound come close to the pitch waveform corresponding to the target speaker'"'"'s speech sound.
-
Specification