Voice synthesizer, voice synthesizing method, and computer program
First Claim
1. A voice synthesizer that uses recorded voices that are pre-recorded to generate a synthesized voice that reads out a text, comprising:
- a recorded voice storage portion that stores the recorded voices that are pre-recorded;
a voice input portion that is input with a reading voice that is a natural voice reading out a text that is to be generated by the synthesized voice;
an attribute information input portion that is input with a label string and label information, the label string being a string of labels that are respectively assigned to each phoneme included in the reading voice and that are placed in a time series, and the label information indicating the border position of each phoneme corresponding to each label;
a parameter extraction portion that extracts a characteristic parameter that indicates a characteristic of the reading voice based on the label string, the label information, and the reading voice; and
a voice synthesis portion that selects at least one of the recorded voices from the recorded voice storage portion in accordance with the characteristic parameter, synthesizes the selected at least one recorded voice, and generates the synthesized voice that reads out the text.
1 Assignment
0 Petitions
Accused Products
Abstract
A voice synthesizer includes a recorded voice storage portion (124) that stores recorded voices that are pre-recorded; a voice input portion (110) that is input with a reading voice reading out a text that is to be generated by the synthesized voice; an attribute information input portion (112) that is input with a label string, which is a string of labels assigned to each phoneme included in the reading voice, and label information, which indicates the border position of each phoneme corresponding to each label; a parameter extraction portion (116) that extracts characteristic parameters of the reading voice based on the label string, the label information, and the reading voice; and a voice synthesis portion (122) that selects the recorded voices from the recorded voice storage portion in accordance with the characteristic parameters, synthesizes the recorded voices, and generates the synthesized voice that reads out the text.
42 Citations
10 Claims
-
1. A voice synthesizer that uses recorded voices that are pre-recorded to generate a synthesized voice that reads out a text, comprising:
-
a recorded voice storage portion that stores the recorded voices that are pre-recorded;
a voice input portion that is input with a reading voice that is a natural voice reading out a text that is to be generated by the synthesized voice;
an attribute information input portion that is input with a label string and label information, the label string being a string of labels that are respectively assigned to each phoneme included in the reading voice and that are placed in a time series, and the label information indicating the border position of each phoneme corresponding to each label;
a parameter extraction portion that extracts a characteristic parameter that indicates a characteristic of the reading voice based on the label string, the label information, and the reading voice; and
a voice synthesis portion that selects at least one of the recorded voices from the recorded voice storage portion in accordance with the characteristic parameter, synthesizes the selected at least one recorded voice, and generates the synthesized voice that reads out the text. - View Dependent Claims (2, 3, 4)
-
-
5. A computer program wherein a computer is directed to function as a voice synthesizer that uses recorded voices that are pre-recorded to generate a synthesized voice that reads out a text, comprising:
-
a voice input process in which a reading voice is input that is a natural voice reading out a text that is to be generated by the synthesized voice;
an attribute information input process in which a label string and label information are input, the label string being a string of labels that are respectively assigned to each phoneme included in the reading voice and that are placed in a time series, and the label information indicating the border position of each phoneme corresponding to each label;
a parameter extraction process that extracts a characteristic parameter that indicates a characteristic of the reading voice based on the label string, the label information, and the reading voice;
a selection process that selects at least one of the recorded voices in accordance with the characteristic parameter from a recorded voice storage portion that stores the recorded voices that are pre-recorded; and
a voice synthesis process that synthesizes the at least one recorded voice selected by the selection process, and generates the synthesized voice that reads out the text.
-
-
6. A voice synthesizing method that uses recorded voices that are pre-recorded to generate a synthesized voice that reads out a text, comprising the steps of:
-
inputting a reading voice that is a natural voice reading out a text that is to be generated by the synthesized voice;
inputting attribute information that includes a label string and label information, the label string being a string of labels that are respectively assigned to each phoneme included in the reading voice and that are placed in a time series, and the label information indicating the border position of each phoneme corresponding to each label;
extracting a characteristic parameter that indicates a characteristic of the reading voice based on the label string, the label information, and the reading voice;
selecting at least one of the recorded voices in accordance with the characteristic parameter from a recorded voice storage portion that stores the recorded voices that are pre-recorded; and
generating the synthesized voice that reads out the text by synthesizing the at least one recorded voice selected in the selection step.
-
-
7. A voice synthesizer that uses recorded voices that are pre-recorded to generate a synthesized voice that reads out a text, comprising:
-
a recorded voice storage portion that stores the recorded voices that are pre-recorded;
a phoneme model storage portion that stores an acoustic model and a prosody model that are generated in advance based on the recorded voices stored in the recorded voice storage portion, the acoustic model modeling an acoustic characteristic of each phoneme included in the recorded voices, and the prosody model modeling a prosody characteristic of each phoneme included in the recorded voices;
a text input portion that is input with a text that is to be generated by the synthesized voice;
an attribute information input portion that is input with a label string and label information, the label string being a string of labels that are respectively assigned to each phoneme included in the reading voice and that are placed in a time series, and the label information indicating the border position of each phoneme corresponding to each label;
a label information adjustment portion that sets, in accordance with a plurality of metrically and/or acoustically different states of each phoneme, the border position of each state;
a text analysis portion that analyses the text and obtains language prosody information;
a characteristic estimation portion that estimates a characteristic of the natural voice reading out the text based on the label string, the label information adjusted by the label information adjustment portion, the language prosody information, and the acoustic model and the prosody model stored in the phoneme model storage portion, and derives a characteristic parameter that indicates the characteristic; and
a voice synthesis portion that selects at least one of the recorded voices from the recorded voice storage portion in accordance with the characteristic parameter, synthesizes the selected at least one recorded voice, and generates the synthesized voice that reads out the text. - View Dependent Claims (8)
-
-
9. A computer program wherein a computer is directed to function as a voice synthesizer that uses recorded voices that are pre-recorded to generate a synthesized voice that reads out a text, the computer program using:
-
a recorded voice storage portion that stores the recorded voices that are pre-recorded; and
a phoneme model storage portion that stores an acoustic model and a prosody model that are generated in advance based on the recorded voices stored in the recorded voice storage portion, the acoustic model modeling an acoustic characteristic of each phoneme included in the recorded voices, and the prosody model modeling a prosody characteristic of each phoneme included in the recorded voices, and comprising;
a text input process in which a text is input that is to be generated by the synthesized voice;
an attribute information input process in which a label string and label information are input, the label string being a string of labels that are respectively assigned to each phoneme included in the text and that are placed in a time series, and the label information indicating the border position of each phoneme corresponding to each label;
a label information adjustment process that sets, in accordance with a plurality of metrically and/or acoustically different states of each phoneme, the border position of each state;
a text analysis process that analyses the text and obtains language prosody information;
a characteristic estimation process that estimates a characteristic of the natural voice reading out the text based on the label string, the label information adjusted by the label information adjustment process, the language prosody information, and the acoustic model and the prosody model stored in the phoneme model storage portion, and derives a characteristic parameter that indicates the characteristic; and
a voice synthesis process that selects at least one of the recorded voices from the recorded voice storage portion in accordance with the characteristic parameter, synthesizes the at least one selected recorded voice, and generates the synthesized voice that reads out the text.
-
-
10. A voice synthesizing method that uses recorded voices that are pre-recorded to generate a synthesized voice that reads out a text, the method using:
-
a recorded voice storage portion that stores the recorded voices that are pre-recorded; and
a phoneme model storage portion that stores an acoustic model and a prosody model that are generated in advance based on the recorded voices stored in the recorded voice storage portion, the acoustic model modeling an acoustic characteristic of each phoneme included in the recorded voices, and the prosody model modeling a prosody characteristic of each phoneme included in the recorded voices, and comprising the steps of;
inputting a text that is to be generated by the synthesized voice;
inputting attribute information that includes a label string and label information, the label string being a string of labels that are respectively assigned to each phoneme included in the text and that are placed in a time series, and the label information indicating the border position of each phoneme corresponding to each label;
adjusting the label information by setting, in accordance with a plurality of metrically and/or acoustically different states of each phoneme, the border position of each state;
analyzing the text and obtaining language prosody information;
estimating a characteristic of the natural voice reading out the text based on the label string, the label information adjusted by the label information adjustment step, the language prosody information, and the acoustic model and the prosody model stored in the phoneme model storage portion, and deriving a characteristic parameter that indicates the characteristic; and
generating the synthesized voice that reads out the text by selecting at least one of the recorded voices from the recorded voice storage portion in accordance with the characteristic parameter and synthesizing the selected at least one recorded voice.
-
Specification