Rule based speech synthesis method and apparatus

US 20050119889A1
Filed: 06/09/2004
Published: 06/02/2005
Est. Priority Date: 06/13/2003
Status: Active Grant

First Claim

Patent Images

1. A rule based speech synthesis apparatus comprising speech element set storage means for storing a plurality of phoneme strings, each having a vowel phoneme on a boundary thereof, as a speech element, along with feature parameters, as a speech element set;

speech element selection means for reading out acoustic feature parameters of a corresponding speech elements from said speech element set storage means, based on an input phoneme string;

target parameter storage means having stored therein representative acoustic feature parameters from one vowel to another;

parameter correction means for reading out a target parameter for a vowel from said target parameter storage in response to the acoustic feature parameter of the speech elements output from said speech element selection means and for correcting the acoustic feature parameter of said speech element based on said target parameters;

time-series data generating means for concatenating plural acoustic feature parameters output from said parameter correction means to generate time series data of the acoustic feature parameters; and

speech synthesizing means for uttering and outputting speech signals of the synthesized speech corresponding to the input phoneme strings in accordance with time-series data of the acoustic feature parameters, corresponding to the input phoneme strings, generated by said time-series data generating means.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A rule based speech synthesis apparatus by which concatenation distortion may be less than a preset value without dependency on utterance, wherein a parameter correction unit reads out a target parameter for a vowel from a target parameter storage, responsive to the phoneme at the a leading end and at a trailing end of a speech element and acoustic feature parameters output from a speech element selector, and accordingly corrects the acoustic feature parameters of the speech element. The parameter correction unit corrects the parameters, so that the parameters ahead and behind the speech element are equal to the target parameter for the vowel of the corresponding phoneme, and outputs the so corrected parameters.

19 Citations

View as Search Results

16 Claims

1. A rule based speech synthesis apparatus comprising speech element set storage means for storing a plurality of phoneme strings, each having a vowel phoneme on a boundary thereof, as a speech element, along with feature parameters, as a speech element set;
- speech element selection means for reading out acoustic feature parameters of a corresponding speech elements from said speech element set storage means, based on an input phoneme string;
  
  target parameter storage means having stored therein representative acoustic feature parameters from one vowel to another;
  
  parameter correction means for reading out a target parameter for a vowel from said target parameter storage in response to the acoustic feature parameter of the speech elements output from said speech element selection means and for correcting the acoustic feature parameter of said speech element based on said target parameters;
  
  time-series data generating means for concatenating plural acoustic feature parameters output from said parameter correction means to generate time series data of the acoustic feature parameters; and
  
  speech synthesizing means for uttering and outputting speech signals of the synthesized speech corresponding to the input phoneme strings in accordance with time-series data of the acoustic feature parameters, corresponding to the input phoneme strings, generated by said time-series data generating means.
- View Dependent Claims (2, 3, 4)
- - 2. The rule based speech synthesis apparatus according to claim 1 wherein said parameter correction means corrects the acoustic feature parameters of the speech element from a leading end to the a trailing end of the speech element as a subject of correction.
  - 3. The rule based speech synthesis apparatus according to claim 1 wherein said parameter correction means determines the a temporal boundary of the a leading end and the a trailing end of the speech elements as said plurality of phoneme strings, as a fixed length.
  - 4. The rule based speech synthesis apparatus according to claim 1 wherein said parameter correction means determines a temporal boundary of a leading end and the a trailing end of the speech element as said phoneme strings in accordance with a boundary of the vowel and the consonant.

5. A rule based speech synthesis method comprising a speech element selecting step of reading out an acoustic feature parameter corresponding to a speech element, based on input phoneme strings, from a speech element set storage storing a plurality of phoneme strings, each having a vowel phoneme on the boundary, as a speech element, along with feature parameters, as a speech element set;
- a parameter correction step of reading out a target parameter for a vowel, in response to the acoustic feature parameters of the speech element output in said speech element selecting step from the target parameter storage means having stored therein the representative acoustic feature parameters from one vowel to another for correcting the acoustic feature parameters of said speech element based on said target parameter;
  
  a time series data generating step of generating time series data of the acoustic feature parameters by concatenating the acoustic feature parameters output from said parameter correction step; and
  
  a speech synthesis step of uttering and outputting a speech signal of the synthesized speech, corresponding to said input phoneme strings, in accordance with the time series data of the acoustic feature parameters, corresponding to said input phoneme strings, generated in said time series data generating step.

6. A rule based speech synthesis apparatus comprising speech element set storage means for storing a plurality of phoneme strings, each having a vowel phoneme on a boundary thereof, as a speech element, along with feature parameters of each speech element, as a speech element set;
- speech element selection means for reading out acoustic feature parameters of a corresponding speech element, from said speech element set storage means based on an input phoneme string;
  
  target parameter storage means having stored therein a plurality of acoustic feature parameters from one vowel to another;
  
  parameter correction means for selecting a specified acoustic feature parameter in response to an acoustic feature parameter of said speech element selection means, from plural acoustic feature parameters stored in said target parameter storage means, and for correcting the acoustic feature parameter of the speech element responsive to the selected specified acoustic feature parameter;
  
  time-series data generating means for concatenating plural acoustic feature parameters output from said parameter correction means to generate time series data of the acoustic feature parameters; and
  
  speech synthesizing means for uttering and outputting speech signals of the synthesized speech corresponding to the input phoneme strings, based on time-series data of the acoustic feature parameters, corresponding to the input phoneme strings, generated by said time-series data generating means.
- View Dependent Claims (7, 8, 9, 10)
- - 7. The rule based speech synthesis apparatus according to claim 6 wherein said parameter correction means selects such a target parameter having a smallest error between a parameter at a trailing end of the speech element output from said speech element set storage means and a plurality of acoustic feature parameters stored in said target parameter storage means, as a specified acoustic feature parameter.
  - 8. The rule based speech synthesis apparatus according to claim 6 wherein said parameter correction means selects a target parameter from the acoustic feature parameters stored in said target parameter storage means based on an error between a parameter at a trailing end of the speech element output from said speech element set storage means and the acoustic feature parameters stored in said target parameter storage means, as a specified acoustic feature parameter.
  - 9. The rule based speech synthesis apparatus according to claim 8 wherein said parameter correction means selects such an acoustic feature parameter having tea smallest value of a sum of an error between a parameter at the a trailing end of the speech element and said plural acoustic feature parameters and an error between a parameter at a leading end of the speech element and said plural acoustic feature parameters, as a specified acoustic feature parameter.
  - 10. The rule based speech synthesis apparatus according to claim 8 wherein said parameter correction means selects such an acoustic feature parameter from said plural acoustic feature parameters which has an error between the parameter at a trailing end of said speech element and the respective acoustic feature parameters or such an acoustic feature parameter from said plural acoustic feature parameters that has an error between the parameter at a leading end of said speech element and the respective acoustic feature parameters, whichever has the smaller error.

11. A rule based speech synthesis method comprising a speech element set selecting step of reading out and outputting an acoustic feature parameter of a corresponding speech element, based on input phoneme strings, from a speech element set storage, adapted for storing plural phoneme strings each having a vowel phoneme on the boundary, as a speech element, as a set of the speech element with the acoustic feature parameter;
- a parameter correcting step of selecting, from plural acoustic feature parameters stored in a target parameter storage having stored therein plural acoustic feature parameters from vowel to vowel, a specified acoustic feature parameter, responsive to the acoustic feature parameter of the speech element output from the speech element selecting step, and for correcting the acoustic feature parameter of the speech element based on the selected specified acoustic feature parameter;
  
  a time-series data generating step of concatenating plural acoustic feature parameters output from said parameter correction step to generate time series data of the acoustic feature parameters; and
  
  speech synthesizing means for uttering and outputting speech signals of the synthesized speech, corresponding to the input phoneme strings, in accordance with time-series data of the acoustic feature parameters, corresponding to the input phoneme strings, generated by said time-series data generating step.

12. A rule based speech synthesizing apparatus comprising speech element correction means for correcting a speech element set having phoneme strings and data of acoustic feature parameters;
- and speech synthesizing means for synthesizing the speech corresponding to input phoneme strings, using an as-corrected speech element set, obtained by said speech element correction means, based on an input phoneme string.
- View Dependent Claims (13)
- - 13. The rule based speech synthesizing apparatus according to claim 12 wherein said speech element correction means includes:
    - parameter correction means for correcting a speech element set having phoneme strings and data of acoustic feature parameters; and
      
      as-corrected speech element set storage means for storing said as-corrected speech element set corrected by said parameter correction means;
      
      said speech synthesizing means including;
      
      said speech element set storage means;
      
      speech element selection means for reading out an acoustic feature parameter corresponding to a phoneme string from said as-corrected speech element set storage means, to output the read-out acoustic feature parameter;
      
      parameter time series generating means for concatenating plural acoustic feature parameters output from said speech element selection means to generate time-series data of acoustic feature parameters; and
      
      speech synthesizing means for uttering and outputting speech signals of synthesized speech corresponding to said input phoneme string based on time-series data of acoustic feature parameters corresponding to said input phoneme strings generated by said parameter time series generating means.

14. A rule based speech synthesizing method comprising a parameter correction step of correcting a speech element set having phoneme strings and data of acoustic feature parameters;
- and an as-corrected speech element set storage step of storing said as-corrected speech element set corrected by said parameter correction means;
  
  a speech element selecting step of reading out and outputting the acoustic feature parameter corresponding to a phoneme string from said as-corrected speech element set storage step based on input phoneme strings;
  
  a parameter time series generating step of concatenating acoustic feature parameters output from said speech element selecting step to generate time-series data of acoustic feature parameters; and
  
  a speech synthesizing step of uttering and outputting speech signals of synthesized speech corresponding to said input phoneme string based on time-series data of acoustic feature parameters corresponding to said input phoneme strings generated by said parameter time series generating step.

15. A rule based speech synthesis apparatus comprising speech element set storage means for storing a plurality of phoneme strings, each having a consonant phoneme on a boundary thereof, as a speech element, along with feature parameters, as a speech element set;
- speech element selection means for reading out acoustic feature parameters of a corresponding speech element, from said speech element set storage means, based on input phoneme strings;
  
  target parameter storage means having stored therein a representative acoustic feature parameter from one consonant to another;
  
  parameter correction means for reading out a target parameter for a consonant from said target parameter storage means, responsive to the acoustic feature parameters of the speech element, output from said speech element selection means, and for correcting the acoustic feature parameters of said speech element based on said target parameters;
  
  time-series data generating means for concatenating plural acoustic feature parameters output from said parameter correction means to generate time series data of the acoustic feature parameters; and
  
  speech synthesizing means for uttering and outputting speech signals of the synthesized speech corresponding to the input phoneme strings in accordance with time-series data of the acoustic feature parameters, corresponding to the input phoneme strings, generated by said time-series data generating means.

16. A rule based speech synthesis method comprising a speech element selecting step of reading out acoustic feature parameters of a corresponding speech element, based on an input phoneme string, from a speech element set storage adapted for storing a plurality of phoneme strings, each having a consonant phoneme on the boundary, as a speech element, along with feature parameters, as a speech element set;
- a parameter correction step of reading out a target parameter for a consonant, responsive to the acoustic feature parameters of the speech element output in said speech element selecting step from the target parameter storage having stored therein the representative acoustic feature parameters from one consonant to another, and for correcting the acoustic feature parameters of said speech element based on said target parameter;
  
  a time series data generating step of generating time series data of the acoustic feature parameters by concatenating the acoustic feature parameters output from said parameter correction step; and
  
  a speech synthesis step of uttering and outputting a speech signal of synthesized speech, corresponding to said input phoneme strings, in accordance with the time series data of the acoustic feature parameters, corresponding to said input phoneme strings, generated in said time series data generating step.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sony Corporation (Sony Group Corp.)
Original Assignee
Sony Corporation (Sony Group Corp.)
Inventors
Yamazaki, Nobuhide

Granted Patent

US 7,765,103 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/259
CPC Class Codes

G10L 13/07 Concatenation rules

Rule based speech synthesis method and apparatus

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

19 Citations

16 Claims

Specification

Use Cases

Quick Links

Others

Rule based speech synthesis method and apparatus

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

19 Citations

16 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others