Speech synthesizer, speech synthesizing method, and program

US 7,454,343 B2
Filed: 04/12/2007
Issued: 11/18/2008
Est. Priority Date: 06/16/2005
Status: Expired due to Fees

First Claim

Patent Images

1. A speech synthesizer comprising:

a target parameter generation unit operable to generate target parameters on an element-by-element basis from information containing at least phonetic symbols, the target parameters being a parameter group through which speech can be synthesized;

a speech element database which stores, on an element-by-element basis, pre-recorded speech as speech elements that are made up of a parameter group in the same format as the target parameters;

an element selection unit operable to select, from said speech element database, a speech element that corresponds to the target parameters;

a parameter group synthesis unit operable to synthesize the parameter group of the target parameters and the parameter group of the speech element by finding the similarity per dimension of the target parameters and the speech element, selecting, based on the similarity per dimension, the speech element in the case where the target parameters and the speech element are judged as being similar and select, based on the similarity per dimension, the target parameters in the case where the target parameters and the speech element are judged as not being similar, and integrating the parameter groups on an element-by-element basis; and

a waveform generation unit operable to generate a synthetic speech waveform based on the synthesized parameter groups.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech synthesizer that provides high-quality sound along with stable sound quality, including: a target parameter generation unit; a speech element DB; an element selection unit; a mixed parameter judgment unit which determines an optimum parameter combination of target parameters and speech elements; a parameter integration unit which integrates the parameters; and a waveform generation unit which generates synthetic speech. High-quality and stable synthetic speech is generated by combining, per parameter dimension, the parameters with stable sound quality generated by the target parameter generation unit with speech elements with high sound quality and a sense of true speech selected by the element selection unit.

Citations

10 Claims

1. A speech synthesizer comprising:
- a target parameter generation unit operable to generate target parameters on an element-by-element basis from information containing at least phonetic symbols, the target parameters being a parameter group through which speech can be synthesized;
  
  a speech element database which stores, on an element-by-element basis, pre-recorded speech as speech elements that are made up of a parameter group in the same format as the target parameters;
  
  an element selection unit operable to select, from said speech element database, a speech element that corresponds to the target parameters;
  
  a parameter group synthesis unit operable to synthesize the parameter group of the target parameters and the parameter group of the speech element by finding the similarity per dimension of the target parameters and the speech element, selecting, based on the similarity per dimension, the speech element in the case where the target parameters and the speech element are judged as being similar and select, based on the similarity per dimension, the target parameters in the case where the target parameters and the speech element are judged as not being similar, and integrating the parameter groups on an element-by-element basis; and
  
  a waveform generation unit operable to generate a synthetic speech waveform based on the synthesized parameter groups.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The speech synthesizer according to claim 1,wherein said parameter group synthesis unit includes:
    - a cost calculation unit operable to calculate, based on a subset of speech elements selected by said speech element selection unit and a subset of target parameters corresponding to the subset of speech elements, a cost indicating dissimilarity between the target parameters and the speech element;
      
      a mixed parameter determination unit operable to determine, on a speech element-by-speech element basis, an optimal parameter combination of the target parameters and the speech element by selecting, based on the cost calculated by said cost calculation unit, the speech element in the case where the target parameters and the speech element are judged as being similar, and the target parameters in the case where the target parameters and the speech element are judged as not being similar; and
      
      a parameter integration unit operable to synthesize the parameter group by integrating the target parameters and the speech element based on the combination determined by said mixed parameter determination unit.
  - 3. The speech synthesizer according to claim 2,wherein said cost calculation unit includes a target cost determination unit operable to calculate a cost indicating non-resemblance between the subset of speech elements selected by said element selection unit and the subset of target parameters corresponding to the subset of speech elements.
  - 4. The speech synthesizer according to claim 3,wherein said cost calculation unit further includes a continuity determination unit operable to calculate a cost indicating discontinuity between temporally sequential speech elements based on a speech element in which the subset of speech elements selected by said element selection unit is replaced with the subset of target parameters corresponding to the subset of speech elements.
  - 5. The speech synthesizer according to claim 1,wherein said speech element database includes:
    - a standard speech database which stores speech elements that have standard emotional qualities; and
      
      an emotional speech database which stores speech elements that have special emotional qualities, andsaid speech synthesizer further comprises a statistical model creation unit operable to create a statistical model of speech having special emotional qualities, based on the speech elements that have standard emotional qualities and the speech elements that have special emotional qualities,wherein said target parameter generation unit is operable to generate the target parameters based on the statistical model of speech having special emotional qualities, on an element-by-element basis, andsaid element selection unit is operable to select speech elements that correspond to the target parameters from said emotional speech database.
  - 6. The speech synthesizer according to claim 1,wherein said parameter group synthesis unit includes:
    - a target parameter pattern generation unit operable to generate at least one parameter pattern obtained by dividing the target parameters generated by said target parameter generation unit into at least one subset;
      
      an element selection unit operable to select, per subset of target parameters generated by said target parameter pattern generation unit, speech elements that correspond to the subset, from said speech element database;
      
      a cost calculation unit operable to calculate, based on the subset of speech elements selected by said element selection unit and a subset of the target parameters corresponding to the subset of speech elements, a cost indicating dissimilarity between the target parameters and the speech element;
      
      a combination determination unit operable to determine, per element, the optimum combination of subsets of target parameters by selecting, based on the cost value calculated by said cost calculation unit, the speech element in the case where the target parameters and the speech element are judged as being similar, and the target parameters in the case where the target parameters and the speech element are judged as not being similar; and
      
      a parameter integration unit operable to synthesize the parameter group by integrating the subsets of speech elements selected by said element selection unit based on the combination determined by said combination determination unit.
  - 7. The speech synthesizer according to claim 6,wherein, in the case where overlapping occurs between subsets when subsets of speech elements are combined, said combination determination unit is operable to determine the optimum combination with the average value of the overlapping parameters used as the value of the parameters.
  - 8. The speech synthesizer according to claim 6,wherein, in the case where parameter dropout occurs when subsets of speech elements are combined, said combination determination unit is operable to determine the optimum combination with the missing parameters being substituted by the target parameters.

9. A speech synthesizing method comprising:
- a step of generating target parameters on an element-by-element basis from information containing at least phonetic symbols, the target parameters being a parameter group through which speech can be synthesized;
  
  a step of selecting a speech element that corresponds to the target parameters, from a speech element database which stores, on an element-by-element basis, pre-recorded speech as speech elements that are made up of a parameter group in the same format as the target parameters;
  
  a step of synthesizing the parameter group of the target parameters and the parameter group of the speech element by finding the similarity per dimension of the target parameters and the speech element, selecting, based on the similarity per dimension, the speech element in the case where the target parameters and the speech element are judged as being similar and select, based on the similarity per dimension, the target parameters in the case where the target parameters and the speech element are judged as not being similar, and integrating the parameter groups on an element-by-element basis; and
  
  a step of generating a synthetic speech waveform based on the synthesized parameter groups.

10. A program stored on computer storage memory which causes a computer to execute steps for speech synthesizing, the steps comprising:
- a step of generating target parameters on an element-by-element basis from information containing at least phonetic symbols, the target parameters being a parameter group through which speech can be synthesized;
  
  a step of selecting a speech element that corresponds to the target parameters, from a speech element database which stores, on an element-by-element basis, pre-recorded speech as speech elements that are made up of a parameter group in the same format as the target parameters;
  
  a step of synthesizing the parameter group of the target parameters and the parameter group of the speech element by finding the similarity per dimension of the target parameters and the speech element, selecting, based on the similarity per dimension, the speech element in the case where the target parameters and the speech element are judged as being similar and select, based on the similarity per dimension, the target parameters in the case where the target parameters and the speech element are judged as not being similar, and integrating the parameter groups on an element-by-element basis; and
  
  a step of generating a synthetic speech waveform based on the synthesized parameter groups.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sovereign Peak Ventures, LLC (Dominion Harbor Enterprises, LLC)
Original Assignee
Panasonic Corporation (Panasonic Holdings Corporation)
Inventors
Hirose, Yoshifumi, Kamai, Takahiro, Saito, Natsuki, Kato, Yumiko
Primary Examiner(s)
Vo; Huyen X.

Application Number

US11/783,855
Publication Number

US 20070203702A1
Time in Patent Office

586 Days
Field of Search

704/260, 704/258, 704/261, 704/263, 704/266, 704/267, 704/270, 704/271, 704/256, 704/268, 704/269
US Class Current

704/256
CPC Class Codes

G10L 13/04 Details of speech synthesis...

G10L 13/06 Elementary speech units use...

Speech synthesizer, speech synthesizing method, and program

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

Speech synthesizer, speech synthesizing method, and program

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links