Text structure for voice synthesis, voice synthesis method, voice synthesis apparatus, and computer program thereof
First Claim
Patent Images
1. A voice synthesis method for synthesizing a voice waveform to continuously change a feature of synthetic voice of a range assigned a predetermined identifier included in input text upon outputting synthetic voice corresponding to the text, the method comprising:
- a setting step, via a setting module, of setting a desired range of text to be output, in which the feature of synthetic voice is to be continuously changed, using a predetermined identifier including attribute information that represents a change mode of the feature of synthetic voice both at a start position and at an end position of the range set by the identifier;
a recognition step of recognizing the predetermined identifier and a type of attribute information contained in the predetermined identifier from the text with the identifier, which is set in said setting step; and
a voice synthesis step of synthesizing a voice waveform, whose feature of synthetic voice continuously changes, in accordance with the attribute information contained in the predetermined identifier, by interpolating synthetic voice corresponding to text within the desired range of the text with the identifier in accordance with a recognition result in said recognition step,wherein the change mode of the feature of synthetic voice includes at least one of a change in output device, a change in a number of speakers and a change in emotion.
1 Assignment
0 Petitions
Accused Products
Abstract
In a voice synthesis apparatus, by bounding a desired range of input text to be output by, e.g., a start tag “<morphing type=“emotion” start=“happy” end=“angry”>” and end tag </morphing>, a feature of synthetic voice is continuously changed while gradually changing voice from a happy voice to an angry voice upon outputting synthetic voice.
-
Citations
4 Claims
-
1. A voice synthesis method for synthesizing a voice waveform to continuously change a feature of synthetic voice of a range assigned a predetermined identifier included in input text upon outputting synthetic voice corresponding to the text, the method comprising:
-
a setting step, via a setting module, of setting a desired range of text to be output, in which the feature of synthetic voice is to be continuously changed, using a predetermined identifier including attribute information that represents a change mode of the feature of synthetic voice both at a start position and at an end position of the range set by the identifier; a recognition step of recognizing the predetermined identifier and a type of attribute information contained in the predetermined identifier from the text with the identifier, which is set in said setting step; and a voice synthesis step of synthesizing a voice waveform, whose feature of synthetic voice continuously changes, in accordance with the attribute information contained in the predetermined identifier, by interpolating synthetic voice corresponding to text within the desired range of the text with the identifier in accordance with a recognition result in said recognition step, wherein the change mode of the feature of synthetic voice includes at least one of a change in output device, a change in a number of speakers and a change in emotion. - View Dependent Claims (3)
-
-
2. A voice synthesis apparatus for synthesizing a voice waveform to continuously change a feature of synthetic voice of a range assigned a predetermined identifier included in input text upon outputting synthetic voice corresponding to the text, the apparatus comprising:
-
recognition means for recognizing, from text with an identifier, in which a predetermined identifier that represents a desired range, in which the feature of synthetic voice is to be continuously changed, and which contains attribute information representing a change mode of the feature of synthetic voice both at a start position and at an end position of the range set by the identifier, the predetermined identifier and a type of attribute information contained in the predetermined identifier from the text with the identifier; and voice synthesis means for synthesizing a voice waveform, whose feature of synthetic voice continuously changes, in accordance with the attribute information contained in the predetermined identifier, by interpolating synthetic voice corresponding to text within the desired range of the text with the identifier in accordance with a recognition result of said recognition means, wherein the change mode of the feature of synthetic voice includes at least one of a change in output device, a change in a number of speakers and a change in emotion.
-
-
4. A computer-readable storage medium storing a computer program comprising program code for causing a computer to serve as a voice synthesis apparatus for synthesizing a voice waveform to change a feature of synthetic voice of a range assigned a predetermined identifier included in input text upon outputting synthetic voice corresponding to the text, the program code comprising:
-
program code for a recognition function of recognizing, from text with an identifier, in which a predetermined identifier that represents a desired range, in which the feature of synthetic voice is to be continuously changed, and which contains attribute information representing a change mode of the feature of synthetic voice both at a start position and at an end position of the range set by the identifier, the predetermined identifier and a type of attribute information contained in the predetermined identifier from the text with the identifier; and program code for a voice synthesis function of synthesizing a voice waveform, whose feature of synthetic voice continuously changes, in accordance with the attribute information contained in the predetermined identifier, by interpolating synthetic voice corresponding to text within the desired range of the text with the identifier in accordance with a recognition result of the recognition function, wherein the change mode of the feature of synthetic voice includes at least one of a change in output device, a change in a number of speakers and a change in emotion.
-
Specification