Speech synthesis with prosodic model data and accent type
First Claim
1. A speech synthesis method of creating voice message data corresponding to an input character string, comprising the steps of:
- using (a) a word dictionary that stores a large number of character strings having at least one character with its accent type, (b) a prosody dictionary that stores typical prosodic model data among prosodic model data representing the prosodic information for the character strings stored in said word dictionary, and (c) a waveform dictionary that stores voice waveform data of a composition unit with a recorded voice;
determining the accent type of the input character string;
selecting the prosodic model data from said prosody dictionary, based on the input character string and the accent type;
transforming the prosodic information of said prosodic model data in accordance with the input character string in response to the character string of the selected prosodic model data not being coincident with the input character string;
selecting the waveform data corresponding to each character of the input character string from the waveform dictionary, based on the prosodic model data;
connecting the selected waveform data with each other;
storing the prosodic model data including the character string, a mora number, the accent type, and syllabic information in said prosody dictionary;
creating the syllabic information of an input character string;
providing a prosodic model candidate by extracting the prosodic model data having the mora number and accent type coincident to that of the input character string from said prosody dictionary;
creating prosodic reconstructed information by comparing the syllabic information of each prosodic model data candidate and the syllabic information of the input character string; and
selecting an optimal prosodic model data based on the character string of each prosodic model data candidate and the prosodic reconstructed information thereof.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech synthesizing method includes determining the accent type of the input character string, selecting the prosodic model data from a prosody dictionary for storing typical ones of the prosodic models representing the prosodic information for the character strings in a word dictionary, based on the input character string and the accent type, transforming the prosodic information of the prosodic model when the character string of the selected prosodic model is not coincident with the input character string, selecting the waveform data corresponding to each character of the input character string from a waveform dictionary, based on the prosodic model data after transformation, and connecting the selected waveform data with each other. Therefore, a difference between an input character string and a character string stored in a dictionary is absorbed, then it is possible to synthesize a natural voice.
-
Citations
19 Claims
-
1. A speech synthesis method of creating voice message data corresponding to an input character string, comprising the steps of:
-
using (a) a word dictionary that stores a large number of character strings having at least one character with its accent type, (b) a prosody dictionary that stores typical prosodic model data among prosodic model data representing the prosodic information for the character strings stored in said word dictionary, and (c) a waveform dictionary that stores voice waveform data of a composition unit with a recorded voice;
determining the accent type of the input character string;
selecting the prosodic model data from said prosody dictionary, based on the input character string and the accent type;
transforming the prosodic information of said prosodic model data in accordance with the input character string in response to the character string of the selected prosodic model data not being coincident with the input character string;
selecting the waveform data corresponding to each character of the input character string from the waveform dictionary, based on the prosodic model data;
connecting the selected waveform data with each other;
storing the prosodic model data including the character string, a mora number, the accent type, and syllabic information in said prosody dictionary;
creating the syllabic information of an input character string;
providing a prosodic model candidate by extracting the prosodic model data having the mora number and accent type coincident to that of the input character string from said prosody dictionary;
creating prosodic reconstructed information by comparing the syllabic information of each prosodic model data candidate and the syllabic information of the input character string; and
selecting an optimal prosodic model data based on the character string of each prosodic model data candidate and the prosodic reconstructed information thereof. - View Dependent Claims (2, 3, 4, 5, 6)
if there is any of the prosodic model data candidates having all its phonemes coincident with those of the input character string, making this prosodic model data candidate the optimal prosodic model data;
if there is no candidate having all its phonemes coincident with those of the input character string, making the candidate having the greatest number of coincident phonemes with those of the input character string among the prosodic model data candidates the optimal prosodic model data; and
if there are plural candidates having the greatest number of phonemes coincident, making the candidate having the greatest number of phonemes consecutively coincident the optimal prosodic model data.
-
-
3. Apparatus for performing the method of claim 2.
-
4. The speech synthesis method according to claim 1, further including obtaining the syllable length after transformation from the average syllable length calculated ahead for all the characters used in the speech synthesis and the syllable length in said prosodic model data for every character not coincident among the prosodic model data in response to the character string of said selected prosodic model data not being coincident with the input character string.
-
5. Apparatus for performing the method of claim 4.
-
6. Apparatus for performing the method of claim 1.
-
7. A speech synthesis method of creating voice message data corresponding to an input character string, comprising the steps of:
-
using (a) a word dictionary that stores a large number of character strings having at least one character with its accent type, (b) a prosody dictionary that stores typical prosodic model data among prosodic model data representing the prosodic information for the character strings stored in said word dictionary, and (c) a waveform dictionary that stores voice waveform data of a composition unit with a recorded voice;
determining the accent type of the input character string;
selecting the prosodic model data from said prosody dictionary, based on the input character string and the accent type;
transforming the prosodic information of said prosodic model data in accordance with the input character string in response to the character string of the selected prosodic model data not being coincident with the input character string;
selecting the waveform data corresponding to each character of the input character string from the waveform dictionary, based on the prosodic model data;
selecting the waveform data of a pertinent phoneme in the prosodic model data from the waveform dictionary, the pertinent phoneme having a position and phoneme coincident with those of the prosodic model data for each phoneme making up an input character string; and
selecting the waveform data of a corresponding phoneme having the frequency closest to that of the prosodic model data from said waveform dictionary for other phonemes. - View Dependent Claims (8, 9)
-
-
10. A speech synthesis apparatus for creating voice message data corresponding to an input character string, comprising:
-
a word dictionary storing a large number of character strings including at least one character with its accent type;
a prosody dictionary storing typical prosodic model data among prosodic model data representing prosodic information for the character strings stored in said word dictionary, said prosody dictionary including the character string, mora number, accent type, and syllabic information;
a waveform dictionary storing voice waveform data of a composition unit with a recorded voice;
accent type determining means for determining the accent type of the input character string;
prosodic model selecting means for selecting the prosodic model data from said prosody dictionary, based on the input character string and the accent type;
prosodic transforming means for transforming the prosodic information of the prosodic model data in accordance with the input character string in response to the character string of said selected prosodic model data not being coincident with the input character string;
waveform selecting means for selecting the waveform data corresponding to each character of the input character string from said waveform dictionary, based on the prosodic model data;
waveform connecting means for connecting the selected waveform data with each other; and
prosodic model selecting means for;
creating the syllabic information of an input character string, extracting the prosodic model data having the mora number and accent type coincident to those of the input character string from said prosody dictionary to provide a prosodic model candidate, creating prosodic reconstructed information by comparing the syllabic information of each prosodic model data candidate and the syllabic information of the input character string, and selecting an optimal prosodic model data based on the character string of each prosodic model data candidate and the prosodic reconstructed information thereof. - View Dependent Claims (11, 12)
(a) if there is any of the prosodic model data candidates having all its coincident phonemes with those of the input character string, this prosodic model data candidate is made the optimal prosodic model data by the prosodic model selecting means;
(b) if there is no candidate having all its phonemes coincident with those of the input character string, the candidate having the greatest number of phonemes coincident with the phonemes of the input character string among the prosodic model data candidates is made the optimal prosodic model data; and
if there are plural candidates having the greatest number of phonemes coincident, the candidate having the greatest number of phonemes consecutively coincident is made the optimal prosodic model data.
-
-
12. The speech synthesis apparatus according to claim 10, further comprising prosody transforming means arranged to be responsive to the character string of said selected prosodic model data not being coincident with the input character string, for obtaining the syllable length after transformation from the average syllable length calculated ahead for all the characters for use in the speech synthesis and the syllable length in said prosodic model data for each character not coincident among the prosodic model data.
-
13. A speech synthesis apparatus for creating voice message data corresponding to an input character string, comprising:
-
a word dictionary storing a large number of character strings including at least one character having an accent type;
a prosody dictionary storing typical prosodic model data among prosodic model data representing prosodic information for the character strings stored in said word dictionary;
a waveform dictionary storing voice waveform data of a composition unit with a recorded voice;
accent type determining means for determining the accent type of the input character string;
prosodic model selecting means for selecting the prosodic model data from said prosody dictionary, based on the input character string and the accent type;
prosodic transforming means for transforming the prosodic information of the prosodic model data in accordance with the input character string in response to the character string of said selected prosodic model data not being coincident with the input character string;
waveform selecting means for;
selecting the waveform data corresponding to each character of the input character string from said waveform dictionary, based on the prosodic model data, selecting the waveform data of a pertinent phoneme in the prosodic model data from said waveform dictionary, the pertinent phoneme having a position and phoneme coincident with those of the prosodic model data for each phoneme making up an input character string, and selecting the waveform data of a phoneme having the frequency closest to that of the prosodic model data from said waveform dictionary for other phonemes; and
waveform connecting means for connecting the selected waveform data with each other. - View Dependent Claims (14)
-
-
15. A computer-readable medium having stored thereon a speech synthesis program, wherein said program, when read by a computer, enables the computer to operate as:
-
a word dictionary for storing a large number of character strings including at least one character with its accent type;
a prosody dictionary for storing typical prosodic model data among prosodic model data representing prosodic information for the character strings stored in said word dictionary, said prosody dictionary including the character string, a mora number, accent type, and syllabic information; and
a waveform dictionary for storing the voice waveform data of a composition unit with a recorded voice;
accent type determining means for determining the accent type of an input character string;
prosodic model selecting means for;
selecting the prosodic model data from said prosody dictionary, based on the input character string and the accent type, and creating the syllabic information of the input character string, extracting the prosodic model data having the mora number and accent type coincident to those of the input character string from said prosody dictionary to provide a prosodic model candidate, creating prosodic reconstructed information by comparing the syllabic information of each prosodic model data candidate and the syllabic information of the input character string, and selecting optimal prosodic model data based on the character string of each prosodic model data and the prosodic reconstructed information thereof;
prosodic transforming means for transforming the prosodic information of said prosodic model data in accordance with the input character string in response to the character string of said selected prosodic model data not being coincident with the input character string;
waveform selecting means for selecting the waveform data corresponding to each character of the input character string from said waveform dictionary, based on the prosodic model data; and
waveform connecting means for connecting said selected waveform data with each other. - View Dependent Claims (16, 17)
if there is any of the prosodic model data candidates having all its coincident phonemes with those of the input character string, making such prosodic model data candidate(s) the optimal prosodic model data;
if there is no candidate having all its phonemes coincident with those of the input character string, making the candidate having a greatest number of phonemes coincident with the phonemes of the input character string among the prosodic model data candidates the optimal prosodic model data; and
if there are plural candidates having the greatest number of phonemes coincident, making the candidate having the greatest number of phonemes consecutively coincident the optimal prosodic model data.
-
-
17. The computer-readable medium according to claim 15, wherein said speech synthesis program further enables the computer to operate as prosody transforming means for obtaining the syllable length after transformation from the average syllable length calculated ahead for all the characters for use in the voice synthesis and the syllable length in said prosodic model data for each character not coincident among the prosodic model data in response to the character string of said selected prosodic model data not being coincident with the input character string.
-
18. A computer-readable medium having recorded thereon a speech synthesis program, wherein said program, when read by a computer, enables the computer to operate as:
-
a word dictionary for storing a large number of character strings including at least one character with its accent type, a prosody dictionary for storing typical prosodic model data among prosodic model data representing the prosodic information for the character strings stored in said word dictionary, and a waveform dictionary for storing the voice waveform data of a composition unit with the recorded voice;
accent type determining means for determining the accent type of an input character string;
prosodic model selecting means for selecting the prosodic model data from said prosody dictionary, based on the input character string and the accent type;
prosodic transforming means for transforming the prosodic information of said prosodic model data in accordance with the input character string in response to the character string of said selected prosodic model data not being coincident with the input character string;
waveform selecting means for selecting the waveform data corresponding to each character of the input character string from said waveform dictionary, based on the prosodic model data, and for selecting the waveform data of pertinent phoneme in the prosodic model data from said waveform dictionary, the pertinent phoneme having the position and phoneme coincident with those of the prosodic model data for every phoneme making up an input character string, and selecting the waveform data of phoneme having the frequency closest to that of the prosodic model data from said waveform dictionary for other phonemes; and
waveform connecting means for connecting said selected waveform data with each other. - View Dependent Claims (19)
-
Specification