Method and apparatus for synthesizing speech from text

US 7,369,995 B2
Filed: 02/25/2004
Issued: 05/06/2008
Est. Priority Date: 02/25/2003
Status: Active Grant

First Claim

Patent Images

1. A speech synthesis method in which speech units are concatenated using a Corpus-based speech database (DB), the method comprising:

determining the speech units to be concatenated and dividing the speech units into a left speech unit and a right speech unit;

variably determining a length of a first interpolation region of the left speech unit and variably determining a length of a second interpolation region of the right speech unit;

attaching an extension to a right boundary of the left speech unit and an extension to a left boundary of the right speech unit;

aligning locations of pitch marks included in the extension of each of the left and right speech units so that the pitch marks can fit in a third interpolation region; and

superimposing the left and right speech units,wherein the attaching comprises;

determining whether extra-segmental data of the left and/or right speech units exists in the speech database;

extending the right boundary of the left speech unit and the left boundary of the right speech unit by using existing data if the extra-segmental data exists in the speech database; and

extending the right boundary of the left speech unit and the left boundary of the right speech unit by using an extrapolation if no extra-segmental data exists in the speech database.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech synthesis method, in which speech units are concatenated using a DB, wherein the speech units to be concatenated are determined and divided into a left speech unit and a right speech unit. The length of an interpolation region of each of the left and right speech units is variably determined. An extension is attached to a right boundary of the left speech unit and an extension to a left boundary of the right speech unit. The locations of pitch marks included in the extension of each of the left and right speech units are aligned so that the pitch marks can fit in the predetermined interpolation region. The left and right speech units are superimposed after fading out the left speech unit and fading in the right speech unit. Accordingly, a determination of whether extra-segmental data exists or not is made, and smoothing concatenation is performed using either an interpolation of existing data or an interpolation of extrapolated data depending on the result of the determination.

11 Citations

View as Search Results

16 Claims

1. A speech synthesis method in which speech units are concatenated using a Corpus-based speech database (DB), the method comprising:
- determining the speech units to be concatenated and dividing the speech units into a left speech unit and a right speech unit;
  
  variably determining a length of a first interpolation region of the left speech unit and variably determining a length of a second interpolation region of the right speech unit;
  
  attaching an extension to a right boundary of the left speech unit and an extension to a left boundary of the right speech unit;
  
  aligning locations of pitch marks included in the extension of each of the left and right speech units so that the pitch marks can fit in a third interpolation region; and
  
  superimposing the left and right speech units,wherein the attaching comprises;
  
  determining whether extra-segmental data of the left and/or right speech units exists in the speech database;
  
  extending the right boundary of the left speech unit and the left boundary of the right speech unit by using existing data if the extra-segmental data exists in the speech database; and
  
  extending the right boundary of the left speech unit and the left boundary of the right speech unit by using an extrapolation if no extra-segmental data exists in the speech database.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The speech synthesis method of claim 1, wherein the speech units to be concatenated are voiced phonemes.
  - 3. The speech synthesis method of claim 2, wherein the lengths of the first and second interpolation regions are less than 40% of an overall length of the voiced phonemes.
  - 4. The speech synthesis method of claim 1, wherein in the superimposing of the speech units, the left and right speech units are superimposed after the left speech unit fades out and the right speech unit fades in.
  - 5. The speech synthesis method of claim 1, further comprising equi-proportionately interpolating pitch periods included in the third interpolation region, between the aligning of the pitch marks and the superimposing of the speech units.

6. A speech synthesis apparatus in which speech units are concatenated using a speech database, the apparatus comprising:
- a concatenation region determination unit determining the speech units to be concatenated, dividing the speech units into a left speech unit and a right speech unit, and variably determining the length of an interpolation region of each of the left and right speech units;
  
  a boundary extension unit attaching an extension to a right boundary of the left speech unit and an extension to a left boundary of the right speech unit;
  
  a pitch mark alignment unit aligning locations of pitch marks included in the extension of each of the left and right speech units so that the pitch marks fit in a predetermined interpolation region; and
  
  a speech unit superimposing unit superimposing the left and right speech units,wherein the boundary extension unit determines whether extra-segmental data of the left and/or right speech units exists in the speech database, extends the right boundary of the left speech unit and the left boundary of the right speech unit either by using existing data if the extra-segmental data exists in the speech database, and extends the right boundary of the left speech unit and the left boundary of the right speech unit either by using an extrapolation if no extra-segmental data exists in the speech database.
- View Dependent Claims (7, 8, 9, 10)
- - 7. The speech synthesis apparatus of claim 6, wherein the speech units to be concatenated are voiced phonemes.
  - 8. The speech synthesis apparatus of claim 7, wherein the lengths of the interpolation regions are less than 40% of an overall length of the voiced phonemes.
  - 9. The speech synthesis apparatus of claim 6, wherein the speech unit superimposing unit superimposes the left and right speech units after making the left speech unit fade out and the right speech unit fade in.
  - 10. The speech synthesis apparatus of claim 6, further comprising a pitch track interpolation unit which receives a pitch waveform from the pitch mark alignment unit, equi-proportionately interpolates the periods of the pitches included in the interpolation region, and outputs the result of equi-proportionate interpolation to the speech unit superimposing unit.

11. A computer readable medium encoded with processing instructions performing a method of speech synthesis in which speech units are concatenated using a speech database, the method comprising:
- determining the speech units to be concatenated and dividing the speech units into a left speech unit and a right speech unit;
  
  variably determining a length of a first interpolation region of the left speech unit and variably determining a length of a second interpolation region of the right speech unit;
  
  attaching an extension to a right boundary of the left speech unit and an extension to a left boundary of the right speech unit;
  
  aligning locations of pitch marks included in the extension of each of the left and right speech units so that the pitch marks can fit in a third interpolation region; and
  
  superimposing the left and right speech units,wherein the attaching of the boundary extensions comprises;
  
  determining whether extra-segmental data of the left and/or right speech units exists in the speech database;
  
  extending the right boundary of the left speech unit and the left boundary of the right speech unit by using existing data if the extra-segmental data exists in the speech database; and
  
  extending the right boundary of the left speech unit and the left boundary of the right speech unit by using an extrapolation if no extra-segmental data exists in the speech database.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The computer readable medium of claim 11, wherein the speech units to be concatenated are voiced phonemes.
  - 13. The speech synthesis method of claim 12, wherein the lengths of the first and second interpolation regions are less than 40% of an overall length of the voiced phonemes.
  - 14. The computer readable medium of claim 11, wherein in the superimposing of the left and right speech units, the left and right speech units are superimposed after the left speech unit fades out and the right speech unit fades in.
  - 15. The computer readable medium of claim 11, wherein between the aligning of the locations of the pitch marks and the superimposing of the left and right speech units, the method further comprises, equi-proportionately interpolating the pitch periods included in the predetermined interpolation region.

16. A speech synthesis apparatus comprising a boundary extension unit determining whether extra-segmental data of a left and/or right speech units exists in a speech database, and extending a right boundary of the left speech unit and the left boundary of the right speech unit either by using existing data if the extra-segmental data exists in the speech database or by using an extrapolation if no extra-segmental data exists in the speech database.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Samsung Electronics Co. Ltd.
Original Assignee
Samsung Electronics Co. Ltd.
Inventors
Kim, Jeong-su, Ferencz, Attila, Lee, Jao-won
Primary Examiner(s)
ABEBE, DANIEL DEMELASH

Application Number

US10/785,113
Publication Number

US 20040167780A1
Time in Patent Office

1,532 Days
Field of Search

704/260, 704/258
US Class Current

704/260
CPC Class Codes

G10L 13/07 Concatenation rules

Method and apparatus for synthesizing speech from text

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

11 Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for synthesizing speech from text

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

11 Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links