Speech synthesis method, speech synthesis system, and speech synthesis program

US 7,668,717 B2
Filed: 11/26/2004
Issued: 02/23/2010
Est. Priority Date: 11/28/2003
Status: Expired due to Fees

First Claim

Patent Images

1. A speech synthesis method, comprising:

storing a group of speech units and prosodic information corresponding to each of the speech units of the group in a memory;

segmenting a phoneme string of a target speech to obtain a plurality of segments;

selecting, from the group in the memory, a speech unit for each of the segments based on prosodic information of the target speech to obtain an optimal speech unit sequence including speech units selected for the respective segments;

selecting M (M represents a positive integer greater than one) speech units for each of the segments from the group in the memory, based on the optimal speech unit sequence; and

generating a new speech unit corresponding to each of the segments, by fusing the M speech units selected for each of the segments, to obtain a plurality of new speech units corresponding to the segments respectively;

wherein the selecting the M speech units for each of the segments includes;

setting each of the segments as a target segment;

calculating a first cost for each speech unit of the group in the memory, the first cost representing a difference between the target segment in the target speech and the speech unit of the group;

calculating a second cost for each speech unit of the group in the memory, the second cost representing a degree of distortion produced when the speech unit of the group is concatenated with speech units before and after the target segment in the optimal speech unit sequence; and

selecting the M speech units for the target segment based on the first cost and the second cost of each speech unit of the group.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech synthesis system stores a group of speech units in a memory, selects a plurality of speech units from the group based on prosodic information of target speech, the speech units selected corresponding to each of segments which are obtained by segmenting a phoneme string of the target speech and minimizing distortion of synthetic speech generated from the speech units selected to the target speech, generates a new speech unit corresponding to the each of the segments, by fusing the speech units selected, to obtain a plurality of new speech units corresponding to the segments respectively, and generates synthetic speech by concatenating the new speech units.

Citations

5 Claims

1. A speech synthesis method, comprising:
- storing a group of speech units and prosodic information corresponding to each of the speech units of the group in a memory;
  
  segmenting a phoneme string of a target speech to obtain a plurality of segments;
  
  selecting, from the group in the memory, a speech unit for each of the segments based on prosodic information of the target speech to obtain an optimal speech unit sequence including speech units selected for the respective segments;
  
  selecting M (M represents a positive integer greater than one) speech units for each of the segments from the group in the memory, based on the optimal speech unit sequence; and
  
  generating a new speech unit corresponding to each of the segments, by fusing the M speech units selected for each of the segments, to obtain a plurality of new speech units corresponding to the segments respectively;
  
  wherein the selecting the M speech units for each of the segments includes;
  
  setting each of the segments as a target segment;
  
  calculating a first cost for each speech unit of the group in the memory, the first cost representing a difference between the target segment in the target speech and the speech unit of the group;
  
  calculating a second cost for each speech unit of the group in the memory, the second cost representing a degree of distortion produced when the speech unit of the group is concatenated with speech units before and after the target segment in the optimal speech unit sequence; and
  
  selecting the M speech units for the target segment based on the first cost and the second cost of each speech unit of the group.
- View Dependent Claims (2, 3, 4)
- - 2. A method according to claim 1, wherein the prosodic information includes at least one of fundamental frequency, duration, and power.
  - 3. A method according to claim 1, wherein generating the new speech unit includes generating M pitch-cycle waveform sequences each including the same numbers of pitch-cycle waveforms, from M pitch-cycle waveform sequences corresponding to the M speech units selected respectively;
    - andgenerating the new speech unit by fusing the M pitch-cycle waveform sequences generated.
  - 4. A method according to claim 3, wherein the new speech unit is generated by calculating a centroid of each pitch-cycle waveform of the new speech unit.

5. A speech synthesis system comprising:
- a memory to store a group of speech units and prosodic information corresponding to each of the speech units of the group;
  
  a first selecting unit configured to select, from the group in the memory, a speech unit for each of segments which are obtained by segmenting a phoneme string of a target speech, based on prosodic information of the target speech, to obtain an optimal speech unit sequence including speech units selected for the respective segments;
  
  a second selecting unit configured to select, based on the optimal speech unit sequence, M (M represents a positive integer greater than one) speech units for each segment of the segments from the group in the memory; and
  
  a generating unit configured to generate a new speech unit corresponding to each of the segments, by fusing the M speech units selected for the segment, to obtain a plurality of new speech units corresponding to the segments respectively;
  
  wherein the second selecting unit is configured to;
  
  set each segment of the segments as a target segment;
  
  calculate a first cost for each speech unit of the group in the memory, the first cost representing a difference between the target segment in the target speech and the speech unit of the group;
  
  calculate a second cost for each speech unit of the group in the memory, the second cost representing a degree of distortion produced when the speech unit of the group is concatenated with speech units before and after the target segment in the optimal speech unit sequence; and
  
  select the M speech units for the target segment based on the first cost and the second cost of each speech unit of the group.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Original Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Inventors
Kagoshima, Takehiko, Mizutani, Tatsuya
Primary Examiner(s)
Vo; Huyen X.

Application Number

US10/996,401
Publication Number

US 20050137870A1
Time in Patent Office

1,915 Days
Field of Search

704/200, 704258-269, 704/270, 704/270.1, 704/243, 704/236
US Class Current

704/261
CPC Class Codes

G10L 13/04 Details of speech synthesis...

G10L 13/06 Elementary speech units use...

Speech synthesis method, speech synthesis system, and speech synthesis program

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

5 Claims

Specification

Solutions

Use Cases

Quick Links

Speech synthesis method, speech synthesis system, and speech synthesis program

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

5 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links