System and method for speech synthesis using a smoothing filter

US 7,277,856 B2
Filed: 10/31/2002
Issued: 10/02/2007
Est. Priority Date: 10/31/2001
Status: Active Grant

First Claim

Patent Images

1. A speech synthesis system for controlling a discontinuous distortion that occurs at a transition portion between concatenated phonemes, which are speech units of synthesized speech, using a smoothing technique, comprising:

a discontinuous distortion processing means for predicting a discontinuity at a transition portion between concatenated samples of phonemes used for speech synthesis through a predetermined learning process, and for controlling speech synthesis so that a discontinuity at the transition portion between the concatenated phonemes of the synthesized speech is smoothed adaptively to correspond to a degree of the predicted discontinuity determined according to a result of the predetermined learning process.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech synthesis system for controlling a discontinuous distortion that occurs at the transition portion between concatenated phonemes which are speech units of a synthesized speech using a smoothing technique, comprising: a discontinuous distortion processing means adapted to predict a discontinuity at the transition portion between concatenated samples of phonemes used for a speech synthesis through a predetermined learning process, and control a discontinuity at the transition portion between the concatenated phonemes of the synthesized speech in such a fashion that it is smoothed adaptively to correspond to a degree of the predicted discontinuity. The smoothing filter smoothes the synthesized speech so that the discontinuity degree of synthesized speech follows the predicted discontinuity degree according to the filter coefficient (a) changed adaptively to correspond to a ratio of the predicted discontinuity degree to the real discontinuity degree. That is, since a discontinuity at a transition portion between concatenated phonemes of the synthesized speech (IN) is adaptively smoothed to follow that which occurs in the actually spoken sound, the synthesized speech (IN) can be approximated more closely to a real human voice.

Citations

18 Claims

1. A speech synthesis system for controlling a discontinuous distortion that occurs at a transition portion between concatenated phonemes, which are speech units of synthesized speech, using a smoothing technique, comprising:
- a discontinuous distortion processing means for predicting a discontinuity at a transition portion between concatenated samples of phonemes used for speech synthesis through a predetermined learning process, and for controlling speech synthesis so that a discontinuity at the transition portion between the concatenated phonemes of the synthesized speech is smoothed adaptively to correspond to a degree of the predicted discontinuity determined according to a result of the predetermined learning process.
- View Dependent Claims (2)
- - 2. The speech synthesis system as claimed in claim 1, wherein the predetermined learning process is performed by a CART (Classification and Regression Tree) scheme.

3. A speech synthesis system comprising:
- a smoothing filter for smoothing a discontinuity that occurs at a transition portion between concatenated phonemes of synthesized speech employing a filter coefficient α
  
  ;
  
  a filter characteristics controller for comparing a degree of a real discontinuity at the transition portion between the concatenated phonemes of the synthesized speech with a degree of a discontinuity predicted according to a result obtained from a predetermined learning process using phoneme samples employed for speech synthesis, and outputting the comparison result as a coefficient selecting signal R; and
  
  filter coefficient determining means for determining the filter coefficient α
  
  in response to the coefficient selecting signal R so as to allow the smoothing filter to smooth discontinuous distortion at the transition portion between the concatenated phonemes of the synthesized speech according to the degree of the predicted discontinuity.
- View Dependent Claims (4, 5, 6, 7)
- - 4. The speech synthesis system as claimed in claim 3, wherein the predetermined learning process is performed by a CART (Classification and Regression Tree) scheme.
  - 5. The speech synthesis system as claimed in claim 4, wherein the phoneme samples used for the prediction of the discontinuity comprises quadraphones (four phonemes) consisting of two phonemes before a transition portion between concatenated phonemes and two phonemes after the transition portion.
  - 6. The speech synthesis system as claimed in claim 3, wherein the coefficient selecting signal R is obtained by the following formula:
    - $R = \frac{D_{p}}{D_{r}}$ where D_pis a degree of the predicted discontinuity, and D_ris a degree of the real discontinuity of the synthesized speech.
  - 7. The speech synthesis system as claimed in claim 3, wherein the filter coefficient determining means determines the filter coefficient α
    - by the following formula in response to the coefficient selecting signal R;
      
      $α = \frac{1}{2} \sqrt{R} + 1) .$

8. A speech synthesis method for controlling a discontinuous distortion that occurs at a transition portion between concatenated phonemes of synthesized speech using a smoothing technique, comprising the steps of:
- (a) comparing a degree of a real discontinuity at the transition portion between the concatenated phonemes of the synthesized speech with a degree of a discontinuity predicted according to a result obtained from a predetermined learning process using concatenated samples of phonemes employed for speech synthesis;
  
  (b) determining a filter coefficient corresponding to the compared result from the step (a) so as to smooth the discontinuity at the transition portion between the concatenated phonemes of the synthesized speech according to the degree of the predicted discontinuity; and
  
  (c) smoothing a discontinuity at the transition portion between the concatenated phonemes of the synthesized speech to correspond to the determined filter coefficient.
- View Dependent Claims (9)
- - 9. A computer readable memory media encoded with executable instructions representing a computer program that can cause a computer to carry out the speech synthesis method as claimed in claim 8.

10. A smoothing filter characteristics control device for adaptively changing, according to the characteristics of a transition portion between concatenated phonemes, which are speech units of synthesized speech, the characteristics of a smoothing filter used in a speech synthesis system for controlling a discontinuous distortion that occurs at the transition portion, the device comprising:
- discontinuity measuring means which obtains a degree of a discontinuity at the transition portion between the concatenated phonemes of the synthesized speech as a real discontinuity degree and outputs the obtained real discontinuity degree;
  
  discontinuity predicting means which stores a result of a learning process predicting discontinuity at a transition portion between concatenated phonemes in actually spoken sounds using samples of phonemes, predicts a degree of a discontinuity at a transition portion between input concatenated samples of phonemes employed for speech synthesis of the synthesized speech according to the result of the learning, and outputs the degree of the predicted discontinuity; and
  
  a comparator which compares the predicted discontinuity degree Dp applied thereto from the discontinuity predicting means with the real discontinuity degree Dr applied thereto from the discontinuity measuring means, and generates the compared result as a coefficient selecting signal for determining a filter coefficient of the smoothing filter.
- View Dependent Claims (11, 12, 13, 14, 15)
- - 11. The smoothing filter characteristics control device as claimed in claim 10, wherein the learning in the discontinuity predicting means is performed by a CART (Classification and Regression Tree) scheme.
  - 12. The smoothing filter characteristics control device as claimed in claim 11, wherein the phoneme samples used for the prediction of the discontinuity comprise quadraphones (four phonemes) consisting of two phonemes before a transition portion between concatenated phonemes in which to predict a discontinuity and two phonemes after the transition portion.
  - 13. The smoothing filter characteristics control device as claimed in claim 12, wherein the predicted discontinuity degree D_pand the real discontinuity degree D_rare obtained by the following formulas;
    - D_p=∥
      
      W_p−
      
      W_n∥
      
      ²
      D_p=∥
      
      W′
      
      _p−
      
      W′
      
      _n∥
      
      ²wherein W_pis a speech waveform of a last pitch cycle of speech units arranged on a left side with respect to a transition portion between concatenated speech units in which to measure a degree of a discontinuity in the synthesized speech, W_nis a speech waveform of a first pitch cycle of speech units arranged on a right side with respect to the transition portion in which to measure the discontinuity degree, W′
      
      _pis a speech waveform of the last pitch cycle of speech units arranged on the left side with respect to a transition portion between concatenated speech units in which to predict a degree of a discontinuity in the actually spoken sounds, and W′
      
      _nis a speech waveform of the first pitch cycle of speech units arranged on the right side with respect to the transition portion in which to predict the discontinuity degree.
  - 14. The smoothing filter characteristics control device as claimed in claim 10, wherein the comparator generates a coefficient selecting signal R obtained by the following formula:
    - $R = \frac{D_{p}}{D_{r}} .$
  - 15. The smoothing filter characteristics control device as claimed in claim 10, wherein the filter coefficient α
    - is determined by the following formula in response to the coefficient selecting signal R;
      
      $α = \frac{1}{2} \sqrt{R} + 1) .$

16. A smoothing filter characteristics control method for adaptively changing, according to characteristics of a transition portion between concatenated phonemes, which are speech units of synthesized speech, characteristics of a smoothing filter used in a speech synthesis system for controlling a discontinuous distortion that occurs at the transition portion, the method comprising the steps of:
- (a) storing a result of a learning process predicting a discontinuity at a transition portion between concatenated phonemes in actually spoken sounds using samples of phonemes;
  
  (b) obtaining a real degree of the discontinuity at the transition portion between the concatenated phonemes of the synthesized speech and outputting the obtained real discontinuity degree;
  
  (c) predicting a degree of a discontinuity at a transition portion between input concatenated samples of phonemes employed for speech synthesis of the synthesized speech according to the result of the learning and outputting the predicted discontinuity degree; and
  
  (d) determining a filter coefficient of the smoothing filter according to the predicted discontinuity degree and the real discontinuity degree.
- View Dependent Claims (17, 18)
- - 17. A smoothing filter characteristics control method as claimed in claim 16 wherein the step (d) further comprises the steps of:
    - (d1) obtaining a ratio R of the predicted discontinuity degree to the real discontinuity degree; and
      
      (d2) determining the filter coefficient α
      
      by the following formula;
      
      $α = \frac{1}{2} \sqrt{R} + 1) .$
  - 18. A computer readable memory media encoded with executable instructions representing a computer program that can cause a computer to carry out the smoothing filter characteristics control method as claimed in claim 16.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Samsung Electronics Co. Ltd.
Original Assignee
Samsung Electronics Co. Ltd.
Inventors
Lee, Jae-won, Kim, Jeong-su, Lee, Ki-seung
Primary Examiner(s)
Azad; Abul K.

Application Number

US10/284,189
Publication Number

US 20030083878A1
Time in Patent Office

1,797 Days
Field of Search

None
US Class Current

704/266
CPC Class Codes

G10L 13/07 Concatenation rules

System and method for speech synthesis using a smoothing filter

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for speech synthesis using a smoothing filter

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links