Method of speaking rate conversion in text-to-speech system

US 20060136215A1
Filed: 11/30/2005
Published: 06/22/2006
Est. Priority Date: 12/21/2004
Status: Abandoned Application

First Claim

Patent Images

1. A method of a speaking rate conversion in a text-to-speech system, the method comprising:

a first step of extracting a vocal list from a synthesis DB (database), voicing the extracted vocal list in each speaking style constituted of fast speaking, normal speaking, and slow speaking, and building a probability distribution of a synthesis unit-based duration;

a second step of searching for an optimal synthesis unit candidate row using a viterbi search, correspondingly to a requested synthesis, and creating a target duration parameter of a synthesis unit; and

a third step of again obtaining an optimal synthesis unit candidate row using the duration parameter of the optimal synthesis unit candidate row, and generating a synthesized sound.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of a speaking rate conversion in a text-to-speech system is provided. The method includes: a first step of extracting a vocal list from a synthesis DB (database), voicing the extracted vocal list in each speaking style constituted of fast speaking, normal speaking, and slow speaking, and building a probability distribution of a synthesis unit-based duration; a second step of searching for an optimal synthesis unit candidate row using a viterbi search, correspondingly to a requested synthesis, and creating a target duration parameter of a synthesis unit; and a third step of again obtaining an optimal synthesis unit candidate row using the duration parameter of the optimal synthesis unit candidate row, and generating a synthesized sound.

Citations

6 Claims

1. A method of a speaking rate conversion in a text-to-speech system, the method comprising:
- a first step of extracting a vocal list from a synthesis DB (database), voicing the extracted vocal list in each speaking style constituted of fast speaking, normal speaking, and slow speaking, and building a probability distribution of a synthesis unit-based duration;
  
  a second step of searching for an optimal synthesis unit candidate row using a viterbi search, correspondingly to a requested synthesis, and creating a target duration parameter of a synthesis unit; and
  
  a third step of again obtaining an optimal synthesis unit candidate row using the duration parameter of the optimal synthesis unit candidate row, and generating a synthesized sound.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein the first step comprises the steps of:
    - extracting an optimal training list for creating a synthesis unit-based duration training model, from the synthesis DB;
      
      recording the extracted training list at the fast speaking and the slow speaking; and
      
      obtaining a continuous probability distribution of a synthesis unit-based duration depending on a speaking rate, from a fast speaking training DB and a slowly speaking training DB.
  - 3. The method of claim 1 or 2, wherein in the first step, the continuous probability distributions (PDP_slow(T_i), PDP_normal(T_i), and PDP_fast(T_i)) of the duration of the speaking style-based synthesis unit (T_i) are expressed using the following Equation 5:
    - PDP_slow(T_i)={μ
      
      _slow(T_i), σ
      
      _slow(T_i)}
      PDP_normal(T_i)={μ
      
      _normal(T_i), σ
      
      _normal(T_i)}
      PDP_fast(T_i)={μ
      
      _fast(T_i), σ
      
      _fast(T_i)}.
      
      [Equation 5]
  - 4. The method of claim 3, wherein the normal speaking is obtained from an original synthesis DB.
  - 5. The method of claim 1, wherein the third step comprises the steps of:
    - producing a new target duration parameter of a final synthesis unit finally influenced from the speaking rate, using a target duration and a continuous probability distribution of a duration of a synthesis unit candidate at the normal speaking, in duration models of synthesis units at the selected slow speaking or fast speaking;
      
      again obtaining an optimal synthesis unit candidate row dependent on a duration, using a viterbi search of the produced new target duration parameter; and
      
      generating a synthesized sound using the again obtained duration-dependent optimal synthesis unit candidate row.
  - 6. The method of claim 5, wherein a process of converting to the new target duration parameter d′
    - (T′
      
      _si) is expressed using the following Equation 8;
      
      $\begin{matrix} {d (t)}^{'} = μ_{target} (t) + \frac{(d (t) - μ_{normal} (t)) \cdot σ_{target} (t) \cdot U_{st}}{σ_{normal} (t)} & [Equation 8] \end{matrix}$ where U_SR;
      
      rate of speaking rate conversion requested by user, t;
      
      T′
      
      _si.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Electronics and Telecommunications Research Institute
Original Assignee
Electronics and Telecommunications Research Institute
Inventors
Kim, Jong Jin

Application Number

US11/290,908
Publication Number

US 20060136215A1
Time in Patent Office

Days
Field of Search
US Class Current

704/265
CPC Class Codes

G10L 13/033 Voice editing, e.g. manipul...

Method of speaking rate conversion in text-to-speech system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

6 Claims

Specification

Solutions

Use Cases

Quick Links

Method of speaking rate conversion in text-to-speech system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

6 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links