Speech synthesis method and apparatus, program, recording medium and robot apparatus
First Claim
1. A speech synthesis method comprising:
- a separating step of separating, from an input text, a singing data portion specified by a singing tag and a text portion;
a singing metrical data forming step of forming singing metrical data from said singing data, said singing metrical data expresses parameters of a lyric;
a speech symbol sequence forming step of forming a speech symbol sequence for said text portion;
a metrical data forming step of forming metrical data from said speech symbol sequence, said metrical data expresses parameters of a speech signal sequence;
a natural metrical data selecting step of analyzing said text portion and selecting, if preset words or sentences exist in said text portion, natural metrical data associated with said preset words or sentences, extracted in advance from the uttered speech of a human being, from a storage means; and
a speech synthesis step of synthesizing the speech based on said singing metrical data, said natural metrical data or said metrical data;
wherein said speech symbol sequence is formed for the section of the text portion other than said preset words or sentences,wherein the natural metrical data includes data about a pitch period, a pitch duration, and a pitch volume registered in a natural metrical dictionary stored in the storage means.
1 Assignment
0 Petitions
Accused Products
Abstract
A sentence or a singing is to be synthesized with a natural speech close to the human voice. To this end, singing metrical data are formed in a tag processing unit 211 in a singing synthesis unit 212 in a speech synthesis apparatus 200 based on singing data and an analyzed text portion. A language analysis unit 213 performs language processing on text portions other than the singing data. As for a text portion registered in a natural metrical dictionary, as determined by this language processing, corresponding natural metrical data is selected and its parameters are adjusted in a metrical data adjustment unit 222 based on phonemic segment data of a phonemic segment storage unit 223 in the metrical data adjustment unit 222. As for a text portion not registered in the natural metrical dictionary, a phonemic symbol string is generated in a natural metrical dictionary storage unit 214, after which metrical data are generated in a metrical generating unit 221. A waveform generating unit 224 concatenates necessary phonemic segment data, based on the natural metrical data, metrical data and the singing metrical data to generate speech waveform data.
176 Citations
24 Claims
-
1. A speech synthesis method comprising:
-
a separating step of separating, from an input text, a singing data portion specified by a singing tag and a text portion; a singing metrical data forming step of forming singing metrical data from said singing data, said singing metrical data expresses parameters of a lyric; a speech symbol sequence forming step of forming a speech symbol sequence for said text portion; a metrical data forming step of forming metrical data from said speech symbol sequence, said metrical data expresses parameters of a speech signal sequence; a natural metrical data selecting step of analyzing said text portion and selecting, if preset words or sentences exist in said text portion, natural metrical data associated with said preset words or sentences, extracted in advance from the uttered speech of a human being, from a storage means; and a speech synthesis step of synthesizing the speech based on said singing metrical data, said natural metrical data or said metrical data; wherein said speech symbol sequence is formed for the section of the text portion other than said preset words or sentences, wherein the natural metrical data includes data about a pitch period, a pitch duration, and a pitch volume registered in a natural metrical dictionary stored in the storage means. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A speech synthesis apparatus comprising:
-
separating means for separating, from an input text, a singing data portion specified by a singing tag and a text portion; singing metrical data forming means for forming singing metrical data from said singing data, said singing metrical data expresses parameters of a lyric; speech symbol sequence forming means for forming a speech symbol sequence for said text portion; metrical data forming means for forming metrical data from said speech symbol sequence, said metrical data expresses parameters of a speech signal sequence; storage means having pre-stored therein preset words or sentences and natural metrical data corresponding to said preset words or sentences extracted from the utterance of a human being; natural metrical data selecting means for analyzing said text portion and selecting, if preset words or sentences exist in said text portion, natural metrical data associated with said preset words or sentences, extracted in advance from the uttered speech of the human being, from said storage means; and speech synthesis means for synthesizing the speech based on said singing metrical data, said natural metrical data or said metrical data; wherein said speech symbol sequence is formed for the section of the text portion other than said preset words or sentences, wherein the natural metrical data includes data about a pitch period, a pitch duration, and a pitch volume registered in a natural metrical dictionary stored in the storage means. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A computer-readable recording medium having recorded thereon a program for having a computer execute preset processing, said program comprising:
-
a separating step of separating, â
om an input text, a singing data portion specified by a singing tag and a text portion;a singing metrical data forming step of forming singing metrical data from said singing data, said singing metrical data expresses parameters of a lyric; a speech symbol sequence forming step of forming a speech symbol sequence for said text portion; a metrical data forming step of forming metrical data from said speech symbol sequence, said metrical data expresses parameters of a speech signal sequence; a natural metrical data selecting step of analyzing said text portion and selecting, if preset words or sentences exist in said text portion, natural metrical data associated with said preset words or sentences, extracted in advance from the uttered speech of a human being, from storage means; and a speech synthesis step of synthesizing the speech based on said singing metrical data, said natural metrical data or said metrical data; wherein said speech symbol sequence is formed for the section of the text portion other than said preset words or sentences, wherein the natural metrical data includes data about a pitch period, a pitch duration, and a pitch volume registered in a natural metrical dictionary stored in the storage means. - View Dependent Claims (14, 15, 16, 17, 18)
-
-
19. An autonomous robot apparatus for performing a behavior based on the input information supplied thereto, comprising:
-
separating means for separating, from an input text, a singing data portion specified by a singing tag, and a text portion; singing metrical data forming means for forming singing metrical data from said singing data, said singing metrical data expresses parameters of a lyric; speech symbol sequence forming means for forming a speech symbol sequence for said text portion; metrical data forming means for forming metrical data from said speech symbol sequence, said metrical data expresses parameters of a speech signal sequence; storage means for storing preset words or sentences and natural metrical data corresponding to said preset words or sentences extracted in advance from the utterance of a human being; natural metrical data selecting means for analyzing said text portion and selecting, if preset words or sentences exist in said text portion, natural metrical data associated with said preset words or sentences extracted in advance from the uttered speech of the human being, from storage means; and speech synthesis means for synthesizing the speech based on said singing metrical data, said natural metrical data or said metrical data; wherein said speech symbol sequence is formed for the section of the text portion other than said preset words or sentences, wherein the natural metrical data includes data about a pitch period, a pitch duration, and a pitch volume registered in a natural metrical dictionary stored in the storage means. - View Dependent Claims (20, 21, 22, 23, 24)
-
Specification