Synthesizing phoneme string of predetermined duration by adjusting initial phoneme duration on values from multiple regression by adding values based on their standard deviations
First Claim
1. A speech synthesizing apparatus for performing speech synthesis according to an inputted phoneme string, comprising:
- storage means for storing statistical data, which comprises at least standard deviation data and multiple regression analysis data, related to a phoneme duration of each phoneme;
determining means for determining the speech production time for the inputted phoneme string;
first initial value obtaining means for obtaining an estimated duration with respect to each phoneme by a multiple regression analysis using the multiple regression anaylsis data stored in said storing means;
setting means for setting an initial phoneme duration for each phoneme constructing the phoneme string based on the estimated duration;
calculating means for calculating a phoneme production time for each phoneme by adding a value calculated based on the standard deviation data of the phoneme which is obtained from said storage and the initial phoneme duration set for the phoneme, wherein the individual phoneme production times are determined so as to add up to the speech production time determined by said determination means; and
generating means for generating a speech waveform by connecting phonemes having the calculated phoneme production time.
1 Assignment
0 Petitions
Accused Products
Abstract
Statistical data including an average value, a standard deviation, and a minimum value of a phoneme duration of each phoneme is stored in a memory. When speech production time is determined for a phoneme string in a predetermined expiratory paragraph, the total phoneme duration of the phoneme string is set so as to become equal to the speech production time. Based on the set phoneme duration, phonemes are connected and a speech waveform is generated. To set a phoneme duration for each phoneme, a phoneme duration initial value is first set based on an average value, obtained by equally dividing the speech production time by phonemes of the phoneme string, and a phoneme duration range, phoneme. Then, set based on statistical data of each the phoneme duration initial value is adjusted based on the statistical data and the speech production time.
192 Citations
19 Claims
-
1. A speech synthesizing apparatus for performing speech synthesis according to an inputted phoneme string, comprising:
-
storage means for storing statistical data, which comprises at least standard deviation data and multiple regression analysis data, related to a phoneme duration of each phoneme;
determining means for determining the speech production time for the inputted phoneme string;
first initial value obtaining means for obtaining an estimated duration with respect to each phoneme by a multiple regression analysis using the multiple regression anaylsis data stored in said storing means;
setting means for setting an initial phoneme duration for each phoneme constructing the phoneme string based on the estimated duration;
calculating means for calculating a phoneme production time for each phoneme by adding a value calculated based on the standard deviation data of the phoneme which is obtained from said storage and the initial phoneme duration set for the phoneme, wherein the individual phoneme production times are determined so as to add up to the speech production time determined by said determination means; and
generating means for generating a speech waveform by connecting phonemes having the calculated phoneme production time. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
said setting means sets the initial duration to fall within a predetermined time range determined based on the average value, the standard deviation, and the minimum value of the phoneme duration, with respect to each phoneme. -
4. The speech synthesizing apparatus according to claim 3, wherein said storage means stores a threshold value indicating the minimum phoneme production period of each phoneme, and wherein said apparatus further comprises means form replacing the phoneme production time calculated by said calculation means by the threshold value, for each phoneme, when the calculated phoneme production time is smaller than the threshold value.
-
5. The speech synthesizing apparatus according to claim 1, wherein said calculated means employs, as a coefficient, a value obtained by subtracting a total initial phoneme duration from the speech production time and dividing the subtracted value by a sum of squares of the standard deviation corresponding to each phoneme, and sets as the phoneme duration, a value obtained by adding a product of the coefficient and a square of the standard deviation of the phoneme the initial phoneme duration.
-
6. The speech synthesizing apparatus according to claim 1, wherein
if the estimated duration falls within a predetermined time range, said first initial value setting means sets the estimated duration as the initial phoneme duration, while if the estimated duration exceeds the predetermined time range, said first initial value setting means sets the initial phoneme duration to fall within the predetermined time range. -
7. The speech synthesizing apparatus according to claim 1, further comprising a second initial value obtaining means for obtaining an estimated duration based on an average time, obtained by dividing the speech production time by the number of phonemes constructing the phoneme string, to each phoneme, and wherein
said setting means selectively utilizes said first initial value obtaining means or said second initial value obtaining means in accordance with the type of phoneme. -
8. The speech synthesizing apparatus according to claim 1, wherein said storage means stores statistical data related to a phoneme duration of each phoneme for each category based on a speech production speed, and
said calculating means determining a category production speed based on the speech production time and the phoneme string, and calculates the phoneme production time of each phoneme based on statistical data belonging to the determined category as well as the estimated duration. -
9. The speech synthesizing apparatus according to claim 1, wherein said calculating means calculates a subtracted value obtained by subtracting a total initial phoneme duration from the speech production time, and calculating a phoneme production time for each phoneme by adding a value calculated based on the standard deviation data of the phoneme and the subtracted value.
-
-
10. A speech synthesizing method of performing speech synthesis according to an inputted phoneme string, comprising the steps of:
-
determining the speech production time of the inputted phoneme string in a predetermined section;
obtaining an estimated duration with respect to each phoneme by a multiple regression analysis using multiple regression anaylsis data stored in storing means;
setting an initial phoneme duration for each phoneme constructing the phoneme string based on the estimated duration;
calculating a phoneme production time for each phoneme by adding a value calculated based on a standard deviation data of the phoneme which is obtained from storage means for storing statistical data, which comprises at least standard deviation data and the multiple regression analysis data related to the phoneme duration of each phoneme and the initial phoneme duration set for the phoneme, wherein the individual phoneme production times are determined so as to add up to the speech production time determined by said determining step; and
generating a speech waveform by connecting phonemes having the calculated phoneme production time. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
a setting step of setting the initial phoneme duration within a predetermined time range determined based on the statistical data stored in said storage unit, with respect to each phoneme constructing the phoneme string.
-
-
12. The speech synthesizing method according to claim 10, wherein the statistical data stored in said storage unit includes an average value, a standard deviation, and a minimum value of the phoneme duration of each phoneme, and said setting step sets the initial duration to fall within a predetermined time range determined based on the average value, the standard deviation, and the minimum value of the phoneme duration, with respect to each phoneme.
-
13. The speech synthesizing method according to claim 12, wherein the storage means stores a threshold value indicating the minimum phoneme production period of each phoneme, and wherein said method further comprises a step for replacing the phoneme production time calculated by said calculation step by the threshold value, for each phoneme, when the calculated phoneme production time is smaller than the threshold value.
-
14. The speech synthesizing method according to claim 10, wherein said calculating step employs, as a coefficient, a value obtained by subtracting a total initial phoneme duration from the speech production time and dividing the subtracted value by a sum squares of the standard deviation corresponding to each phoneme, and a value obtained by adding a product of the coefficient and a square of the standard deviation of the phoneme to the initial phoneme duration, is set as the phoneme duration.
-
15. The speech synthesizing method according to claim 10, wherein,
if the estimated duration fall within a predetermined time range, said setting step sets the estimated duration as the initial phoneme duration, while if the estimated duration exceeds the predetermined time range, said setting step sets the initial phoneme duration to fall within the predetermined time range. -
16. The speech synthesizing method according to claim 10, further comprising a second initial value obtaining step of obtaining an estimated duration based on an average time, obtained by dividing the speech production time by the number of phonemes constructing the phoneme string, to each phoneme, and wherein
said setting step selectively utilizes the first initial value obtaining step or the second initial value obtaining step in accordance with the type of phoneme. -
17. The speech synthesizing method according to claim 10, wherein said storage unit stores statistical data related to a phoneme duration of each phoneme for each category based on a speech production speed, and
in said calculating step, a category of speech production speed is determined based on the speech production time and the phoneme string, and the phoneme production time of each phoneme is calculated based on statistical data belonging to the determined category as well as the estimated duration. -
18. The speech synthesizing method according to claim 10, wherein the calculating step calculates a subtracted value by subtracting a total initial phoneme duration from the speech production time, and calculating a phoneme production time for each phoneme by adding a value calculated based on the standard deviation data of the phoneme and the subtracted value.
-
19. A storage medium storing a control program for instructing a computer to perform a speech synthesizing process for performing speech synthesis according to an inputted phoneme string, said control program comprising:
-
codes for instructing the computer to determine the speech production time for the inputted phoneme string;
codes for obtaining an estimated duration with respect to each phoneme by a multiple regression analysis using multiple regression analysis data stored in storing means;
codes for instructing the computer to set an initial phoneme duration for each phoneme constructing the phoneme string based on the estimated duration;
calculating the phoneme production time for each phoneme by adding a value calculated based on the standard deviation data of the phoneme which is obtained from the storage means for storing statistical data, which comprises at least standard deviation data and the multiple regression analysis data, related to the phoneme duration of each phoneme and the initial phoneme duration set for the phoneme, wherein the individual phoneme production times are determined so as to add up to the speech production time determined by said computer in response to the codes for instructing the computer to determine the speech production time for the inputted phoneme string; and
codes for instructing the computer to generate a speech waveform by connecting phonemes having the calculated phoneme production time.
-
Specification