Speech synthesis method, speech synthesis system, and speech synthesis program

US 20050137870A1
Filed: 11/26/2004
Published: 06/23/2005
Est. Priority Date: 11/28/2003
Status: Active Grant

First Claim

Patent Images

1. A speech synthesis method comprising:

selecting a plurality of speech units from a group of speech units, based on prosodic information of target speech, the speech units selected corresponding to each of segments which are obtained by segmenting a phoneme string of the target speech;

generating a new speech unit corresponding to the each of segments, by fusing speech units selected, to obtain a plurality of new speech units corresponding to the segments respectively; and

generating synthetic speech by concatenating the new speech units.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech synthesis system stores a group of speech units in a memory, selects a plurality of speech units from the group based on prosodic information of target speech, the speech units selected corresponding to each of segments which are obtained by segmenting a phoneme string of the target speech and minimizing distortion of synthetic speech generated from the speech units selected to the target speech, generates a new speech unit corresponding to the each of the segments, by fusing the speech units selected, to obtain a plurality of new speech units corresponding to the segments respectively, and generates synthetic speech by concatenating the new speech units.

Citations

19 Claims

1. A speech synthesis method comprising:
- selecting a plurality of speech units from a group of speech units, based on prosodic information of target speech, the speech units selected corresponding to each of segments which are obtained by segmenting a phoneme string of the target speech;
  
  generating a new speech unit corresponding to the each of segments, by fusing speech units selected, to obtain a plurality of new speech units corresponding to the segments respectively; and
  
  generating synthetic speech by concatenating the new speech units.
- View Dependent Claims (2, 3, 5, 7, 8, 9, 10, 11)
- - 2. The method according to claim 1, wherein the speech units selected minimizes distortion of synthetic speech generated from the speech units selected, to the target speech.
  - 3. The method according to claim 2, wherein the selecting includes selecting a optimal speech unit sequence minimizing distortion of synthetic speech generated from the optimal speech unit sequence, to the target speech;
    - and selecting the speech units corresponding to the each of the segments based on corresponding one speech unit of the optimal speech unit sequence.
  - 5. The method according to claim 1, wherein the prosodic information includes at least one of fundamental frequency, duration, and power of the target speech.
  - 7. The method according to claim 2, wherein the selecting includes calculating a first cost for each of the group, the first cost representing difference between the each one of the group and the target speech;
    - calculating a second cost for each of the group, the second cost representing a degree of distortion produced when the each one of the group is concatenated to another of the group; and
      
      selecting the speech units corresponding to the each of segments based on the first cost and the second cost of the each one of the group.
  - 8. The method according to claim 7, wherein the first cost is calculated using at least one of a fundamental frequency, duration, power, phonetic environment, and spectrum of the each one of the group and the target speech.
  - 9. The method according to claim 7, wherein the second cost is calculated using at least one of a spectrum, fundamental frequency, and power of the each one of the group and another of the group.
  - 10. The method according to claim 1, wherein the generating the new speech unit includes generating a plurality of pitch-cycle waveform sequences each including the same numbers of pitch-cycle waveforms, from a plurality of pitch-cycle waveform sequences corresponding to the speech units selected respectively;
    - and generating the new speech unit by fusing the pitch-cycle waveform sequences generated.
  - 11. The method according to claim 10, wherein the new speech units are generated by calculating a centroid of each pitch-cycle waveform of the new speech unit.

4. The speech synthesis method for generating synthetic speech by concatenating speech units selected from a first group of speech units based on a phoneme string and prosodic information of target speech, the method comprising:
- storing a second group of speech units and environmental information items corresponding to the second group respectively in a memory;
  
  selecting a plurality of speech units from the second group based on each of training environmental information items corresponding to training speech units respectively, the speech units selected whose environmental information items being similar to the each of the training environmental information items; and
  
  generating each of speech units of the first group,-by fusing the speech units selected.
- View Dependent Claims (6, 12, 13)
- - 6. The method according to claim 4, wherein each of the environmental information items and the training environmental information items includes at least one of fundamental frequency, duration, and power.
  - 12. The method according to claim 4, wherein the generating the each of speech units of the first group includes generating a plurality of pitch-cycle waveform sequences each including the same numbers of pitch-cycle waveforms, from a plurality of pitch-cycle waveform sequences corresponding to the speech units selected respectively;
    - and generating the each of speech units of the first group by fusing the pitch-cycle waveform sequences generated.
  - 13. The method according to claim 12, wherein the each of speech units of the first group is generated by calculating a centroid of each pitch-cycle wave of the each of speech units of the first group.

14. A speech synthesis system comprising:
- a memory to store a group of speech units;
  
  a selecting unit configured to select a plurality of speech units from the group of speech units, based on prosodic information of target speech, the speech units selected corresponding to each of segments which are obtained by segmenting a phoneme string of the target speech;
  
  a first generating unit configured to generate a new speech unit corresponding to the each of segments, by fusing the speech units selected, to obtain a plurality of new speech units corresponding to the segments respectively; and
  
  a second generating unit configured to generate synthetic speech by concatenating the new speech units.

15. A speech synthesis system comprising:
- a memory to store a first group of speech units, each of the speech units of the first group being generated by fusing a plurality of speech units whose environmental information items being similar to one of training environmental information items and are selected from a second group of speech units; and
  
  a generating unit configured to generate synthetic speech by concatenating a plurality of speech units selected from the first group based on a phoneme string and prosodic information of target speech.

16. A computer program stored on a computer readable medium, the computer program comprising:
- first program instruction means for instructing a computer processor to select a plurality of speech units from a first group of speech units stored in a first memory based on prosodic information of target speech, the speech units selected corresponding to each of segments which are obtained by segmenting a phoneme string of the target speech;
  
  second program instruction means for instructing a computer processor to generate a new speech unit corresponding to the each of segments, by fusing the speech units selected, to obtain a plurality of new speech units corresponding to the segments respectively; and
  
  third program instruction means for instructing a computer processor to generate synthetic speech by concatenating the new speech units.
- View Dependent Claims (17)
- - 17. The computer program of claim 16, further comprising fourth program instruction means for instructing a computer processor to generate each speech unit of the first group by fusing a plurality of speech units whose environmental information items being similar to training environmental information item and are selected from a second group of speech units stored in a second memory.

18. A speech synthesis system comprising:
- a memory to store a group of speech units;
  
  a selecting unit configured to select a plurality of speech units from the group based on prosodic information of target speech, the speech units selected corresponding to each of segments which are obtained by segmenting a phoneme string of the target speech and minimizing distortion of synthetic speech generated from the speech units selected to the target speech;
  
  a first generating unit configured to generate a new speech unit corresponding to the each of the segments, by fusing the speech units selected, to obtain a plurality of new speech units corresponding to the segments respectively; and
  
  a second generating unit configured to generate synthetic speech by concatenating the new speech units.
- View Dependent Claims (19)
- - 19. The system according to claim 18, wherein the selecting unit selects a optimal speech unit sequence minimizing distortion of synthetic speech generated from the optimal speech unit sequence, and selects the speech units corresponding to the each of the segments based on corresponding one speech unit of the optimal speech unit sequence.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Original Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Inventors
Mizutani, Tatsuya, Kagoshima, Takehiko

Granted Patent

US 7,668,717 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/264
CPC Class Codes

G10L 13/04 Details of speech synthesis...

G10L 13/06 Elementary speech units use...

Speech synthesis method, speech synthesis system, and speech synthesis program

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Speech synthesis method, speech synthesis system, and speech synthesis program

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links