Speech Synthesis Device and Method

US 20070233489A1
Filed: 04/01/2005
Published: 10/04/2007
Est. Priority Date: 05/11/2004
Status: Active Grant

First Claim

Patent Images

1. A speech synthesis device which synthesizes a speech having a desired voice characteristic, said device comprising:

a speech element storage unit operable to store speech elements of plural voice characteristics;

a target element information generation unit operable to generate speech element information corresponding to language information, based on the language information including phoneme information;

an element selection unit operable to select, from said speech element storage unit, a speech element sequence corresponding to the speech element information;

a voice characteristics designation unit operable to accept a designation regarding a voice characteristic of a synthesized speech;

a voice characteristics transformation unit operable to transform the speech element sequence selected by said element selection unit into a speech element sequence of the voice characteristic accepted by said voice characteristics designation unit;

a distortion determination unit operable to determine a distortion between the speech element sequence transformed by said voice characteristics transformation unit and the speech element sequence before the transformation; and

a target element information correction unit operable to correct the speech element information generated by said target element information generation unit to speech element information corresponding to the speech element sequence transformed by said voice characteristics transformation unit, in the case where said distortion determination unit determines that the transformed speech element sequence is distorted, wherein said element selection unit is operable to select, from said speech element storage unit, a speech element sequence corresponding to the corrected speech element information, in the case where said target element information correction unit has corrected the speech element information.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech synthesis device, in which the sound quality is not significantly degraded when generating a synthesized sound, includes a target element information generation unit (102), an element database (103), an element selection unit (104), a voice characteristics designation unit (105), a voice characteristics transformation unit (106), a distortion determination unit (108), and a target element information correction unit (109). When the speech element sequence transformed by the voice characteristics transformation unit (106) is determined as distorted by the distortion determination unit (108), the target element information correction unit (109) corrects the speech element information generated by the target element information generation unit (102) to the speech element information of the transformed voice characteristic, and the element selection unit (104) reselects a speech element sequence. Therefore, the synthesized sound of the voice characteristic designated by the voice characteristics designation unit (105) is generated without degrading the sound quality of the synthesized sound.

Citations

16 Claims

1. A speech synthesis device which synthesizes a speech having a desired voice characteristic, said device comprising:
- a speech element storage unit operable to store speech elements of plural voice characteristics;
  
  a target element information generation unit operable to generate speech element information corresponding to language information, based on the language information including phoneme information;
  
  an element selection unit operable to select, from said speech element storage unit, a speech element sequence corresponding to the speech element information;
  
  a voice characteristics designation unit operable to accept a designation regarding a voice characteristic of a synthesized speech;
  
  a voice characteristics transformation unit operable to transform the speech element sequence selected by said element selection unit into a speech element sequence of the voice characteristic accepted by said voice characteristics designation unit;
  
  a distortion determination unit operable to determine a distortion between the speech element sequence transformed by said voice characteristics transformation unit and the speech element sequence before the transformation; and
  
  a target element information correction unit operable to correct the speech element information generated by said target element information generation unit to speech element information corresponding to the speech element sequence transformed by said voice characteristics transformation unit, in the case where said distortion determination unit determines that the transformed speech element sequence is distorted, wherein said element selection unit is operable to select, from said speech element storage unit, a speech element sequence corresponding to the corrected speech element information, in the case where said target element information correction unit has corrected the speech element information.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The speech synthesis device according to claim 1, wherein said voice characteristics transformation unit is further operable to transform the speech element sequence corresponding to the corrected speech element information into the speech element sequence of the voice characteristic accepted by said voice characteristics designation unit.
  - 3. The speech synthesis device according to claim 1, wherein said target element information correction unit is further operable to add a vocal tract feature of the speech element sequence transformed by said voice characteristics transformation unit, to the corrected speech element information, when correcting the speech element information generated by said target element information generation unit.
  - 4. The speech synthesis device according to claim 3, wherein the vocal tract feature is one of a cepstrum coefficient of the speech element sequence transformed by said voice characteristics transformation unit and a time pattern of the cepstrum coefficient.
  - 5. The speech synthesis device according to claim 3, wherein the vocal tract feature is one of a formant frequency of the speech element sequence transformed by said voice characteristics transformation unit and a time pattern of the formant frequency.
  - 6. The speech synthesis device according to claim 1, wherein said distortion determination unit is operable to determine a distortion based on a connectivity between adjacent speech elements.
  - 7. The speech synthesis device according to claim 6, wherein said distortion determination unit is operable to determine a distortion based on one of the following:
    - a cepstrum distance between the adjacent speech elements;
      
      a formant frequency distance between the adjacent speech elements;
      
      a fundamental frequency difference between the adjacent speech elements; and
      
      a power distance between the adjacent speech elements.
  - 8. The speech synthesis device according to claim 1, wherein said distortion determination unit is operable to determine a distortion based on a degree of deformation between the speech element sequence selected by said element selection unit and the speech element sequence transformed by said voice characteristics transformation unit.
  - 9. The speech synthesis device according to claim 8, wherein said distortion determination unit is operable to determine a distortion based on one of the following:
    - a cepstrum distance between the speech element sequence selected by said element selection unit and the transformed speech element sequence;
      
      a formant frequency distance between the speech element sequence selected by said element selection unit and the transformed speech element sequence;
      
      a fundamental frequency difference between the speech element sequence selected by said element selection unit and the transformed speech element sequence; and
      
      a power difference between the speech element sequence selected by said element selection unit and the transformed speech element sequence.
  - 10. The speech synthesis device according to claim 1, wherein said distortion determination unit is operable to determine a distortion by a unit of phoneme, syllable, mora, morpheme, word, clause, accent phrase, phrase, breath group, or whole sentence.
  - 11. The speech synthesis device according to claim 1, wherein said element selection unit is operable to select, from said speech element storage unit, the speech element sequence corresponding to the corrected speech element information, only with respect to a range in which the distortion is detected by said distortion determination unit, in the case where said target element information correction unit has corrected the speech element information.
  - 12. The speech synthesis device according to claim 11 further comprising an element holding unit operable to hold an identifier of the speech element sequence selected by said element selection unit, wherein said element selection unit is operable to select the speech element sequence based on the identifier held by said element holding unit, with respect to the speech element sequence in a range in which the distortion is not detected by said distortion determination unit.
  - 13. The speech synthesis device according to claim 1, wherein said speech element storage unit includes:
    - a basic speech element storage unit operable to store a speech element of a standard voice characteristic;
      
      a voice characteristics speech element storage unit operable to store speech elements of plural voice characteristics, the speech elements being different from the speech element of the standard voice characteristic, said element selection unit includes;
      
      a basic element selection unit operable to select, from said basic speech element storage unit, a speech element sequence corresponding to the speech element information generated by said target element information generation unit; and
      
      a voice characteristics element selection unit operable to select, from said voice characteristics speech element storage unit, the speech element sequence corresponding to the speech element information corrected by said target element information correction unit.

14. A speech synthesis method for use in a speech synthesis device including a speech element storage unit for storing speech elements of plural voice characteristics, said method comprising:
- a target element information generation step of generating speech element information corresponding to language information, based on the language information including phoneme information;
  
  an element selection step of selecting, from the speech element storage unit, a speech element sequence corresponding to the speech element information;
  
  a voice characteristics designation step of accepting a designation regarding a voice characteristic of a synthesized speech;
  
  a voice characteristics transformation step of transforming the speech element sequence selected in said element selection step into a speech element sequence of the voice characteristic accepted in said voice characteristics designation step;
  
  a distortion determination step of determining a distortion between the speech element sequence transformed in said voice characteristics transformation step and the speech element sequence before the translation; and
  
  a target element information correction step of correcting the speech element information generated in said target element information generation step to speech element information corresponding to the speech element sequence transformed in said voice characteristics transformation step, in the case where it is determined that the transformed speech element sequence is distorted in said distortion determination step, wherein in said element selection step, a speech element sequence corresponding to the corrected speech element information is selected from the speech element storage unit in the case where the speech element information has been corrected in said target element information correction step.

15. A program for causing a computer to function as a speech synthesis device, wherein the computer includes a speech element storage unit for storing speech elements of plural voice characteristics, and said program causing a computer to function as:
- a target element information generation unit operable to generate speech element information corresponding to language information, based on the language information including phoneme information;
  
  an element selection unit operable to select, from said speech element storage unit, a speech element sequence corresponding to the speech element information;
  
  a voice characteristics designation unit operable to accept a designation regarding a voice characteristic of a synthesized speech;
  
  a voice characteristics transformation unit operable to transform the speech element sequence selected by said element selection unit into a speech element sequence of the voice characteristic accepted by said voice characteristics designation unit;
  
  a distortion determination unit operable to determine a distortion between the speech element sequence transformed by said voice characteristics transformation unit and the speech element sequence before the transformation; and
  
  a target element information correction unit operable to correct the speech element information generated by said target element information generation unit to speech element information corresponding to the speech element sequence transformed by said voice characteristics transformation unit, in the case where said distortion determination unit determines that the transformed speech element sequence is distorted, wherein said element selection unit is operable to select, from said speech element storage unit, a speech element sequence corresponding to the corrected speech element information, in the case where said target element information correction unit has corrected the speech element information.

16. A computer-readable recording medium on which a program executed by a computer is recorded, wherein the computer includes a speech element storage unit for storing speech elements of plural voice characteristics, and the program causing a computer to function as:
- a target element information generation unit operable to generate speech element information corresponding to language information, based on the language information including phoneme information;
  
  an element selection unit operable to select, from said speech element storage unit, a speech element sequence corresponding to the speech element information;
  
  a voice characteristics designation unit operable to accept a designation regarding a voice characteristic of a synthesized speech;
  
  a voice characteristics transformation unit operable to transform the speech element sequence selected by said element selection unit into a speech element sequence of the voice characteristic accepted by said voice characteristics designation unit;
  
  a distortion determination unit operable to determine a distortion between the speech element sequence transformed by said voice characteristics transformation unit and the speech element sequence before the transformation; and
  
  a target element information correction unit operable to correct the speech element information generated by said target element information generation unit to speech element information corresponding to the speech element sequence transformed by said voice characteristics transformation unit, in the case where said distortion determination unit determines that the transformed speech element sequence is distorted, wherein said element selection unit is operable to select, from said speech element storage unit, a speech element sequence corresponding to the corrected speech element information, in the case where said target element information correction unit has corrected the speech element information.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Panasonic Intellectual Property Corporation of America (Panasonic Holdings Corporation)
Original Assignee
Panasonic Corporation (Panasonic Holdings Corporation)
Inventors
Hirose, Yoshifumi, Nii, Hiromori

Granted Patent

US 7,912,719 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/258
CPC Class Codes

G10L 13/033   Voice editing, e.g. manipul...

G10L 13/06   Elementary speech units use...

G10L 2021/0135   Voice conversion or morphing

Speech Synthesis Device and Method

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Speech Synthesis Device and Method

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links