Method of speech segment selection for concatenative synthesis based on prosody-aligned distance measure

US 20030195743A1
Filed: 07/29/2002
Published: 10/16/2003
Est. Priority Date: 04/10/2002
Status: Active Grant

First Claim

Patent Images

1. A method of speech segment selection for concatenative synthesis based on prosody-aligned distance measure, comprising the steps of:

(A) segmenting speech stored in a speech corpus into at least one speech segment according to a unit type, wherein each speech segment has its prosody information;

(B) locating pitch marks for each speech segment;

(C) selecting one of the speech segment according to the unit type as a source segment and other speech segments as target segments, and performing a prosody alignment between the source segment and each target segment to obtain a prosody-aliened source segment, wherein the pitch marks of the prosody-aligned source segment are aligned with the pitch marks of the target segment;

(D) measuring distortion between the prosody-aligned source segment and each target segment to obtain a distance between the prosody-aligned source segment and each target segment, and to obtain an average distance between the prosody-aligned source segment and each target segment; and

(E) selecting at least one speech segment with a relative small average distance.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of speech segment selection for concatenative synthesis based on prosody-aligned distance measure is disclosed. This method is based on comparison of speech segments segmented from a speech corpus, wherein speech segments are fully prosody-aligned to each other before distortion measure. With prosody alignment embedded in selection process, distortion resulting from possible prosody modification in synthesis could be taken into account objectively in selection phase. In order to carry out the purpose of the present invention, automatic segmentation, pitch marking and PSOLA method work together for prosody alignment. Two distortion measures, MFCC and PSQM are used for comparing two prosody-aligned segments of speech because of human perceptual consideration.

11 Citations

View as Search Results

11 Claims

1. A method of speech segment selection for concatenative synthesis based on prosody-aligned distance measure, comprising the steps of:
- (A) segmenting speech stored in a speech corpus into at least one speech segment according to a unit type, wherein each speech segment has its prosody information;
  
  (B) locating pitch marks for each speech segment;
  
  (C) selecting one of the speech segment according to the unit type as a source segment and other speech segments as target segments, and performing a prosody alignment between the source segment and each target segment to obtain a prosody-aliened source segment, wherein the pitch marks of the prosody-aligned source segment are aligned with the pitch marks of the target segment;
  
  (D) measuring distortion between the prosody-aligned source segment and each target segment to obtain a distance between the prosody-aligned source segment and each target segment, and to obtain an average distance between the prosody-aligned source segment and each target segment; and
  
  (E) selecting at least one speech segment with a relative small average distance.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method as claimed in claim 1, wherein in step (A), the unit type is a syllable.
  - 3. The method as claimed in claim 1, wherein in step (A), the speech corpus is automatically segmented into at least one speech segment according to a unit type by a computer.
  - 4. The method as claimed in claim 3, wherein the speech is segmented by using a Markov model.
  - 5. The method as claimed in claim 1, wherein in step (C), the prosody alignment is performed between the source segment and each target segment by using a pitch synchronous overlap-and-add (PSOLA) algorithm.
  - 6. The method as claimed in claim 1, wherein in step (D), the distance is D_ij=dist(Ŝ
    - _i<
      
      S_j>
      
      ,S_j), where S_jis the source segment, S_jis the target segment, and Ŝ
      
      _i<
      
      S_j>
      
      is the waveform of the prosody-aligned source segment.
  - 7. The method as claimed in claim 6, wherein step (D) measures the distortion between the prosody-aligned source segment and each target segment by using a Me1-frequency cepstrum coefficients (MFCC) algorithm.
  - 8. The method as claimed in claim 6, wherein step (D) measures the distortion between the prosody-aligned source segment and each target segment by using a perceptual speech quality measure (PSQM) method.
  - 9. The method as claimed in claim 6, wherein the average distance of one speech segment S_iamong other speech segments is
  - 10. The method as claimed in claim 9, wherein the value i of the speech segment S_ican be calculated according to an inverse function of the average distance, where the inverse function is i=arg{D_i}.
  - 11. The method as claimed in claim 10, wherein the value of i of the speech segment S_iwith the smallest average distance can be calculated according to the inverse function i_opt

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Industrial Technology Research Institute
Original Assignee
Industrial Technology Research Institute
Inventors
Kuo, Chi-Shiang, Kuo, Chih-Chung

Granted Patent

US 7,315,813 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/207
CPC Class Codes

G10L 13/04 Details of speech synthesis...

G10L 13/07 Concatenation rules

Method of speech segment selection for concatenative synthesis based on prosody-aligned distance measure

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

11 Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

Method of speech segment selection for concatenative synthesis based on prosody-aligned distance measure

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

11 Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links