Method and apparatus for synthesizing speech from text
First Claim
1. A speech synthesis method in which speech units are concatenated using a Corpus-based speech database (DB), the method comprising:
- determining the speech units to be concatenated and dividing the speech units into a left speech unit and a right speech unit;
variably determining a length of a first interpolation region of the left speech unit and variably determining a length of a second interpolation region of the right speech unit;
attaching an extension to a right boundary of the left speech unit and an extension to a left boundary of the right speech unit;
aligning locations of pitch marks included in the extension of each of the left and right speech units so that the pitch marks can fit in a third interpolation region; and
superimposing the left and right speech units,wherein the attaching comprises;
determining whether extra-segmental data of the left and/or right speech units exists in the speech database;
extending the right boundary of the left speech unit and the left boundary of the right speech unit by using existing data if the extra-segmental data exists in the speech database; and
extending the right boundary of the left speech unit and the left boundary of the right speech unit by using an extrapolation if no extra-segmental data exists in the speech database.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech synthesis method, in which speech units are concatenated using a DB, wherein the speech units to be concatenated are determined and divided into a left speech unit and a right speech unit. The length of an interpolation region of each of the left and right speech units is variably determined. An extension is attached to a right boundary of the left speech unit and an extension to a left boundary of the right speech unit. The locations of pitch marks included in the extension of each of the left and right speech units are aligned so that the pitch marks can fit in the predetermined interpolation region. The left and right speech units are superimposed after fading out the left speech unit and fading in the right speech unit. Accordingly, a determination of whether extra-segmental data exists or not is made, and smoothing concatenation is performed using either an interpolation of existing data or an interpolation of extrapolated data depending on the result of the determination.
11 Citations
16 Claims
-
1. A speech synthesis method in which speech units are concatenated using a Corpus-based speech database (DB), the method comprising:
-
determining the speech units to be concatenated and dividing the speech units into a left speech unit and a right speech unit; variably determining a length of a first interpolation region of the left speech unit and variably determining a length of a second interpolation region of the right speech unit; attaching an extension to a right boundary of the left speech unit and an extension to a left boundary of the right speech unit; aligning locations of pitch marks included in the extension of each of the left and right speech units so that the pitch marks can fit in a third interpolation region; and superimposing the left and right speech units, wherein the attaching comprises; determining whether extra-segmental data of the left and/or right speech units exists in the speech database; extending the right boundary of the left speech unit and the left boundary of the right speech unit by using existing data if the extra-segmental data exists in the speech database; and extending the right boundary of the left speech unit and the left boundary of the right speech unit by using an extrapolation if no extra-segmental data exists in the speech database. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A speech synthesis apparatus in which speech units are concatenated using a speech database, the apparatus comprising:
-
a concatenation region determination unit determining the speech units to be concatenated, dividing the speech units into a left speech unit and a right speech unit, and variably determining the length of an interpolation region of each of the left and right speech units; a boundary extension unit attaching an extension to a right boundary of the left speech unit and an extension to a left boundary of the right speech unit; a pitch mark alignment unit aligning locations of pitch marks included in the extension of each of the left and right speech units so that the pitch marks fit in a predetermined interpolation region; and a speech unit superimposing unit superimposing the left and right speech units, wherein the boundary extension unit determines whether extra-segmental data of the left and/or right speech units exists in the speech database, extends the right boundary of the left speech unit and the left boundary of the right speech unit either by using existing data if the extra-segmental data exists in the speech database, and extends the right boundary of the left speech unit and the left boundary of the right speech unit either by using an extrapolation if no extra-segmental data exists in the speech database. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A computer readable medium encoded with processing instructions performing a method of speech synthesis in which speech units are concatenated using a speech database, the method comprising:
-
determining the speech units to be concatenated and dividing the speech units into a left speech unit and a right speech unit; variably determining a length of a first interpolation region of the left speech unit and variably determining a length of a second interpolation region of the right speech unit; attaching an extension to a right boundary of the left speech unit and an extension to a left boundary of the right speech unit; aligning locations of pitch marks included in the extension of each of the left and right speech units so that the pitch marks can fit in a third interpolation region; and superimposing the left and right speech units, wherein the attaching of the boundary extensions comprises; determining whether extra-segmental data of the left and/or right speech units exists in the speech database; extending the right boundary of the left speech unit and the left boundary of the right speech unit by using existing data if the extra-segmental data exists in the speech database; and extending the right boundary of the left speech unit and the left boundary of the right speech unit by using an extrapolation if no extra-segmental data exists in the speech database. - View Dependent Claims (12, 13, 14, 15)
-
-
16. A speech synthesis apparatus comprising a boundary extension unit determining whether extra-segmental data of a left and/or right speech units exists in a speech database, and extending a right boundary of the left speech unit and the left boundary of the right speech unit either by using existing data if the extra-segmental data exists in the speech database or by using an extrapolation if no extra-segmental data exists in the speech database.
Specification