Speech synthesis method, speech synthesis system, and speech synthesis program

US 7,856,357 B2
Filed: 08/18/2008
Issued: 12/21/2010
Est. Priority Date: 11/28/2003
Status: Expired due to Fees

First Claim

Patent Images

1. A speech synthesis method comprising:

storing a group of speech units in a memory;

segmenting a phoneme string of a target speech, to obtain a plurality of segments;

selecting, from the group in the memory, a speech unit for each of the segments based on prosodic information of the target speech, to obtain an optimal speech unit sequence including speech units selected for the respective segments;

selecting M (M represents a positive integer greater than one) speech units for each of the segments from the group in the memory, based on the optimal speech unit sequence;

generating a new speech unit corresponding to each of the segments, by fusing the M speech units selected for said each of the segments, to obtain a plurality of new speech units corresponding to the segments respectively; and

generating synthetic speech by concatenating the new speech units.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech synthesis system stores a group of speech units in a memory, selects a plurality of speech units from the group based on prosodic information of target speech, the speech units selected corresponding to each of segments which are obtained by segmenting a phoneme string of the target speech and minimizing distortion of synthetic speech generated from the speech units selected to the target speech, generates a new speech unit corresponding to the each of the segments, by fusing the speech units selected, to obtain a plurality of new speech units corresponding to the segments respectively, and generates synthetic speech by concatenating the new speech units.

Citations

10 Claims

1. A speech synthesis method comprising:
- storing a group of speech units in a memory;
  
  segmenting a phoneme string of a target speech, to obtain a plurality of segments;
  
  selecting, from the group in the memory, a speech unit for each of the segments based on prosodic information of the target speech, to obtain an optimal speech unit sequence including speech units selected for the respective segments;
  
  selecting M (M represents a positive integer greater than one) speech units for each of the segments from the group in the memory, based on the optimal speech unit sequence;
  
  generating a new speech unit corresponding to each of the segments, by fusing the M speech units selected for said each of the segments, to obtain a plurality of new speech units corresponding to the segments respectively; and
  
  generating synthetic speech by concatenating the new speech units.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. A method according to claim 1, wherein the prosodic information includes at least one of fundamental frequency, duration, and power of the target speech.
  - 3. A method according to claim 1, wherein selecting the M speech units for each of the segments includes:
    - setting each segment of the segments as a target segment;
      
      calculating a first cost for each speech unit of the group in the memory, the first cost representing difference between the target segment in the target speech and the speech unit of the group;
      
      calculating a second cost for each speech unit of the group in the memory, the second cost representing a degree of distortion produced when the speech unit of the group is concatenated with speech units around the target segment in the optimal speech unit sequence; and
      
      selecting the M speech units for the target segment based on the first cost and the second cost of the each speech unit of the group.
  - 4. A method according to claim 3, wherein the first cost is calculated using at least one of a fundamental frequency, duration, power, phonetic environment, and spectrum of the each one of the group and the target speech.
  - 5. A method according to claim 3, wherein the second cost is calculated using at least one of a spectrum, fundamental frequency, and power of the each one of the group and another of the group.
  - 6. A method according to claim 1, wherein the generating the new speech unit includes generating a plurality of pitch-cycle waveform sequences each including the same numbers of pitch-cycle waveforms, from M pitch-cycle waveform sequences corresponding to the M speech units selected respectively;
    - andgenerating the new speech unit by fusing the M pitch-cycle waveform sequences generated.
  - 7. A method according to claim 6, wherein the new speech units is generated by calculating a centroid of each pitch-cycle waveform of the new speech unit.

8. A speech synthesis system comprising:
- a memory to store a group of speech units;
  
  a first selecting unit configured to select, from the group in the memory, a speech unit for each of segments which are obtained by segmenting a phoneme string of a target speech, based on prosodic information of the target speech, to obtain an optimal speech unit sequence including speech units selected for the respective segments;
  
  a second selecting unit configured to select, based on the optimal speech unit sequence, M (M represents a positive integer greater than one) speech units for each segment of the segments from the group in the memory;
  
  a first generating unit configured to generate a new speech unit corresponding to each segment of the segments, by fusing the M speech units selected for the segment, to obtain a plurality of new speech units corresponding to the segments respectively; and
  
  a second generating unit configured to generate synthetic speech by concatenating the new speech units.

9. A non-transitory computer readable medium storing program instructions which when executed by a computer results in performance of steps comprising:
- selecting from a first group of speech units in a first memory, a speech unit per each of segments which are obtained by segmenting a phoneme string of a target speech, based on prosodic information of the target speech, to obtain an optimal speech unit sequence including speech units selected for the respective segments;
  
  selecting M (M represents a positive integer greater than one) speech units for each of the segments from the first group in the first memory, based on the optimal speech unit sequence;
  
  generating a new speech unit corresponding to each segment of the segments, by fusing the M speech units selected for the segment, to obtain a plurality of new speech units corresponding to the segments respectively; and
  
  generating synthetic speech by concatenating the new speech units.
- View Dependent Claims (10)
- - 10. The non-transitory computer readable medium of claim 9, further storing a program instruction to generate a speech unit of the first group in the first memory by fusing a plurality of speech units whose environmental information items being similar to a desired environmental information item and are selected from a second group of speech units stored in a second memory.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Original Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Inventors
Kagoshima, Takehiko, Mizutani, Tatsuya
Primary Examiner(s)
Vo; Huyen X.

Application Number

US12/193,530
Publication Number

US 20080312931A1
Time in Patent Office

855 Days
Field of Search

704/220, 704/260, 704/258, 704/266, 704/261, 704/268, 704/270, 704/270.1, 704/243, 704/236
US Class Current

704/261
CPC Class Codes

G10L 13/04 Details of speech synthesis...

G10L 13/06 Elementary speech units use...

Speech synthesis method, speech synthesis system, and speech synthesis program

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

Speech synthesis method, speech synthesis system, and speech synthesis program

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links