Speech synthesis apparatus and method

US 9,002,711 B2
Filed: 12/16/2010
Issued: 04/07/2015
Est. Priority Date: 03/25/2009
Status: Expired due to Fees

First Claim

Patent Images

1. A speech synthesis apparatus comprising:

a selecting unit configured to select speaker'"'"'s parameters, of a plurality of speakers, one by one for respective speakers and obtain a plurality of speakers'"'"' parameters, the speaker'"'"'s parameters being prepared for respective pitch waveforms corresponding to speaker'"'"'s speech sounds, the speaker'"'"'s parameters including formant frequencies, formant phases, formant powers, and window functions concerning respective formants that are contained in the respective pitch waveforms;

a mapping unit configured to use a cost function to assess a weighted sum of a difference between the formant frequencies and a difference between the formant powers, to determine formants of the plurality of speakers'"'"' parameters that correspond to each other;

a generating unit configured to generate an interpolated speaker'"'"'s parameter by interpolating, in accordance with desired interpolation ratios, the formant frequencies, formant phases, formant powers, and window functions of the formants of the plurality of speakers'"'"' parameters that correspond to each other; and

a synthesizing unit configured to synthesize a pitch waveform corresponding to interpolated speaker'"'"'s speech sounds based on the interpolation ratios using the interpolated speaker'"'"'s parameter.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

According to an embodiment, a speech synthesis apparatus includes a selecting unit configured to select speaker'"'"'s parameters one by one for respective speakers and obtain a plurality of speakers'"'"' parameters, the speaker'"'"'s parameters being prepared for respective pitch waveforms corresponding to speaker'"'"'s speech sounds, the speaker'"'"'s parameters including formant frequencies, formant phases, formant powers, and window functions concerning respective formants that are contained in the respective pitch waveforms. The apparatus includes a mapping unit configured to make formants correspond to each other between the plurality of speakers'"'"' parameters using a cost function based on the formant frequencies and the formant powers. The apparatus includes a generating unit configured to generate an interpolated speaker'"'"'s parameter by interpolating, at desired interpolation ratios, the formant frequencies, formant phases, formant powers, and window functions of formants which are made to correspond to each other.

20 Citations

View as Search Results

13 Claims

1. A speech synthesis apparatus comprising:
- a selecting unit configured to select speaker'"'"'s parameters, of a plurality of speakers, one by one for respective speakers and obtain a plurality of speakers'"'"' parameters, the speaker'"'"'s parameters being prepared for respective pitch waveforms corresponding to speaker'"'"'s speech sounds, the speaker'"'"'s parameters including formant frequencies, formant phases, formant powers, and window functions concerning respective formants that are contained in the respective pitch waveforms;
  
  a mapping unit configured to use a cost function to assess a weighted sum of a difference between the formant frequencies and a difference between the formant powers, to determine formants of the plurality of speakers'"'"' parameters that correspond to each other;
  
  a generating unit configured to generate an interpolated speaker'"'"'s parameter by interpolating, in accordance with desired interpolation ratios, the formant frequencies, formant phases, formant powers, and window functions of the formants of the plurality of speakers'"'"' parameters that correspond to each other; and
  
  a synthesizing unit configured to synthesize a pitch waveform corresponding to interpolated speaker'"'"'s speech sounds based on the interpolation ratios using the interpolated speaker'"'"'s parameter.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The apparatus according to claim 1, whereinthe generating unit inserts, into the interpolated speaker'"'"'s parameter, a formant frequency, a formant phase, a formant power, and a window function concerning a formant which is not corresponded to other formants.
  - 3. The apparatus according to claim 1, whereinthe speaker'"'"'s parameters are prepared for respective pitch waveforms corresponding to periodic components of speaker'"'"'s speech sounds,the synthesizing unit synthesizes a pitch waveform corresponding to a periodic component of the interpolated speaker'"'"'s speech sound using the interpolated speaker'"'"'s parameter, andthe apparatus further comprisesa second selecting unit configured to select, one by one for respective speakers, pitch waveforms corresponding to aperiodic components of the speaker'"'"'s speech sounds and obtain a plurality of pitch waveforms,a second generating unit configured to generate a pitch waveform corresponding to an aperiodic component of the interpolated speaker'"'"'s speech sound by interpolating the plurality of pitch waveforms at the interpolation ratios, anda second synthesizing unit configured to synthesize the pitch waveform corresponding to the periodic component of the interpolated speaker'"'"'s speech sound and the pitch waveform corresponding to the aperiodic component of the interpolated speaker'"'"'s speech sound, and obtain the pitch waveform corresponding to the interpolated speaker'"'"'s speech sound.
  - 4. The apparatus according to claim 1, whereinthe mapping unit applies, to the formant frequencies, a function for compensating for a difference in vocal tract length between speakers, and then makes formants correspond to each other between the plurality of speakers'"'"' parameters using the cost function.
  - 5. The apparatus according to claim 1, whereinthe mapping unit applies, to the formant powers, a function for compensating for a difference in power between speakers, and then makes formants correspond to each other between the plurality of speakers'"'"' parameters using the cost function.
  - 6. The apparatus according to claim 1, further comprising:
    - a second generating unit configured to generate a pitch waveform corresponding to a target speaker'"'"'s speech sound; and
      
      a calculating unit configured to calculate an optimum interpolation ratio for obtaining the target speaker'"'"'s speech sound based on the plurality of speakers'"'"' parameters, by performing, for the interpolation ratios, feedback control of making the pitch waveform corresponding to the interpolated speaker'"'"'s speech sound come close to the pitch waveform corresponding to the target speaker'"'"'s speech sound.
  - 7. The apparatus according to claim 1, wherein the interpolation ratio is a ratio assigned to the speaker'"'"'s parameter.

8. A non-transitory computer readable storage medium storing instructions of a computer program which when executed by a computer results in performance of steps comprising:
- selecting speaker'"'"'s parameters, of a plurality of speakers, one by one for respective speakers and obtaining a plurality of speakers'"'"' parameters, the speaker'"'"'s parameters being prepared for respective pitch waveforms corresponding to speaker'"'"'s speech sounds, the speaker'"'"'s parameters including formant frequencies, formant phases, formant powers, and window functions concerning respective formants that are contained in the respective pitch waveforms;
  
  using a cost function to assess a weighted sum of a difference between the formant frequencies and a difference between the formant powers, to determine formants of the plurality of speakers'"'"' parameters that correspond to each other;
  
  generating an interpolated speaker'"'"'s parameter by interpolating, at desired interpolation ratios, the formant frequencies, formant phases, formant powers, and window functions of formants of the plurality of speakers'"'"' parameters that correspond to each other; and
  
  synthesizing a pitch waveform corresponding to interpolated speaker'"'"'s speech sounds based on the interpolation ratios using the interpolated speaker'"'"'s parameter.
- View Dependent Claims (9)
- - 9. The non-transitory computer readable storage medium according to claim 8, wherein the speaker'"'"'s parameters being prepared for respective pitch waveforms correspond to periodic components of the speaker'"'"'s speech sounds and correspond to aperiodic components of the speaker'"'"'s speech sounds;
    - andwherein the step of synthesizing the pitch waveform comprises synthesizing the pitch waveform to correspond to the periodic components and a pitch waveform corresponding to the aperiodic components of the interpolated speaker'"'"'s speech sounds based on the interpolation ratios using the interpolated speaker'"'"'s parameter.

10. A speech synthesis method comprising:
- selecting speaker'"'"'s parameters, of a plurality of speakers, one by one for respective speakers and obtaining a plurality of speakers'"'"' parameters, by a selecting unit, the speaker'"'"'s parameters being prepared for respective pitch waveforms corresponding to speaker'"'"'s speech sounds, the speaker'"'"'s parameters including formant frequencies, formant phases, formant powers, and window functions concerning respective formants that are contained in the respective pitch waveforms;
  
  using a cost function to assesses a weighted sum of a difference between the formant frequencies and a difference between the formant powers, to determine formants of the plurality of speakers'"'"' parameters that correspond to each other, by a mapping unit;
  
  generating an interpolated speaker'"'"'s parameter by interpolating, at desired interpolation ratios, the formant frequencies, formant phases, formant powers, and window functions of formants of the plurality of speakers'"'"' parameters that correspond to each other, by a generating unit; and
  
  synthesizing a pitch waveform corresponding to interpolated speaker'"'"'s speech sounds based on the interpolation ratios using the interpolated speaker'"'"'s parameter, by a synthesis unit.
- View Dependent Claims (11)
- - 11. The speech synthesis method according to claim 10, wherein the speaker'"'"'s parameters being prepared for respective pitch waveforms correspond to periodic components of a speaker'"'"'s speech sounds and aperiodic components of the speaker'"'"'s speech sounds;
    - andwherein the step of synthesizing the pitch waveform comprises synthesizing the pitch waveform corresponding to the periodic and aperiodic components of the interpolated speaker'"'"'s speech sounds based on the interpolation ratios using the interpolated speaker'"'"'s parameter, by a synthesis unit.

12. A speech synthesis apparatus comprising:
- a selecting unit configured to select speaker'"'"'s parameters one by one for respective speakers and obtain a plurality of speakers'"'"' parameters, the speaker'"'"'s parameters being prepared for respective pitch waveforms corresponding to speaker'"'"'s speech sounds, the speaker'"'"'s parameters including formant frequencies, formant phases, formant powers, and window functions concerning respective formants that are contained in the respective pitch waveforms;
  
  a mapping unit configured to make formants correspond to each other between the plurality of speakers'"'"' parameters using a cost function based on the formant frequencies and the formant powers;
  
  a generating unit configured to generate an interpolated speaker'"'"'s parameter by interpolating, in accordance with desired interpolation ratios, the formant frequencies, formant phases, formant powers, and window functions of the formants which are made to correspond to each other;
  
  a synthesizing unit configured to synthesize a pitch waveform corresponding to interpolated speaker'"'"'s speech sounds based on the interpolation ratios using the interpolated speaker'"'"'s parameter;
  
  a second selecting unit configured to select, one by one for respective speakers, pitch waveforms corresponding to aperiodic components of the speaker'"'"'s speech sounds and obtain a plurality of pitch waveforms;
  
  a second generating unit configured to generate a pitch waveform corresponding to an aperiodic component of the interpolated speaker'"'"'s speech sound by interpolating the plurality of pitch waveforms at the interpolation ratios; and
  
  a second synthesizing unit configured to synthesize the pitch waveform corresponding to the periodic component of the interpolated speaker'"'"'s speech sound and the pitch waveform corresponding to the aperiodic component of the interpolated speaker'"'"'s speech sound, and obtain the pitch waveform corresponding to the interpolated speaker'"'"'s speech sound.

13. A speech synthesis apparatus comprising:
- a selecting unit configured to select speaker'"'"'s parameters one by one for respective speakers and obtain a plurality of speakers'"'"' parameters, the speaker'"'"'s parameters being prepared for respective pitch waveforms corresponding to speaker'"'"'s speech sounds, the speaker'"'"'s parameters including formant frequencies, formant phases, formant powers, and window functions concerning respective formants that are contained in the respective pitch waveforms;
  
  a mapping unit configured to make formants correspond to each other between the plurality of speakers'"'"' parameters using a cost function based on the formant frequencies and the formant powers;
  
  a generating unit configured to generate an interpolated speaker'"'"'s parameter by interpolating, in accordance with desired interpolation ratios, the formant frequencies, formant phases, formant powers, and window functions of the formants which are made to correspond to each other;
  
  a synthesizing unit configured to synthesize a pitch waveform corresponding to interpolated speaker'"'"'s speech sounds based on the interpolation ratios using the interpolated speaker'"'"'s parameter;
  
  a second generating unit configured to generate a pitch waveform corresponding to a target speaker'"'"'s speech sound; and
  
  a calculating unit configured to calculate an optimum interpolation ratio for obtaining the target speaker'"'"'s speech sound based on the plurality of speakers'"'"' parameters, by performing, for the interpolation ratios, feedback control of making the pitch waveform corresponding to the interpolated speaker'"'"'s speech sound come close to the pitch waveform corresponding to the target speaker'"'"'s speech sound.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Original Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Inventors
Morinaka, Ryo, Kagoshima, Takehiko
Primary Examiner(s)
Godbold, Douglas

Application Number

US12/970,162
Publication Number

US 20110087488A1
Time in Patent Office

1,573 Days
Field of Search

704258-261, 704265-269
US Class Current

704/266
CPC Class Codes

G10L 13/033   Voice editing, e.g. manipul...

G10L 13/06   Elementary speech units use...

G10L 19/097   using prototype waveform de...

G10L 2021/0135   Voice conversion or morphing

G10L 25/15   the extracted parameters be...

Speech synthesis apparatus and method

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

20 Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Speech synthesis apparatus and method

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

20 Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links