Speech synthesis apparatus
First Claim
1. A speech synthesis apparatus which synthesizes speech, said apparatus comprising:
- a prosody generating unit for generating prosody information of the speech based on synthesized speech generation information; and
a synthesis unit for synthesizing the speech based on the prosody information,wherein said prosody generation unit is for;
specifying a time position including a phoneme boundary in the speech to be synthesized into which a micro-prosody pattern is to be embedded, based on the synthesized speech generation information;
extracting a micro-prosody pattern from a storage unit, the micro-prosody pattern being a pattern of a fine time structure of prosody including the phoneme boundary; and
embedding the extracted micro-prosody pattern into the specified time position as watermark information, the embedded micro-prosody pattern indicating that the speech is synthesized speech.
3 Assignments
0 Petitions
Accused Products
Abstract
A speech synthesis apparatus, which can embed unchangeable additional information into synthesized speech without causing a deterioration of speech quality and restriction by bands, includes a language processing unit which generates synthesized speech generation information necessary for generating synthesized speech in accordance with a language string, a prosody generating unit which generates prosody information of speech based on the synthesized speech generation information, and a waveform generating unit which synthesizes speech based on the prosody information, in which the prosody generating unit embed code information as watermark information in the prosody information of a segment having a predetermined time duration within a phoneme length including a phoneme boundary.
-
Citations
18 Claims
-
1. A speech synthesis apparatus which synthesizes speech, said apparatus comprising:
-
a prosody generating unit for generating prosody information of the speech based on synthesized speech generation information; and a synthesis unit for synthesizing the speech based on the prosody information, wherein said prosody generation unit is for; specifying a time position including a phoneme boundary in the speech to be synthesized into which a micro-prosody pattern is to be embedded, based on the synthesized speech generation information; extracting a micro-prosody pattern from a storage unit, the micro-prosody pattern being a pattern of a fine time structure of prosody including the phoneme boundary; and embedding the extracted micro-prosody pattern into the specified time position as watermark information, the embedded micro-prosody pattern indicating that the speech is synthesized speech. - View Dependent Claims (2, 3, 4, 14, 15)
-
-
5. A synthesis speech identifying apparatus which identifies whether or not inputted speech is synthesized speech, said apparatus comprising:
-
a fundamental frequency calculating unit for calculating a speech fundamental frequency of the inputted speech on a per frame basis, each frame having a predetermined duration; a storage unit in which a micro-prosody pattern is stored, the micro-prosody pattern being a pattern of a fine time structure of prosody including a phoneme boundary, and being used to identify the inputted speech as synthesized speech; and an identifying unit for; extracting, in a segment having a duration including a phoneme boundary within which a micro-prosody pattern of the inputted speech exists as watermark information, the fundamental frequency of the speech calculated by said fundamental frequency calculation unit; matching a pattern of the extracted fundamental frequency with the micro-prosody pattern stored in said storage unit; and identifying whether or not the inputted speech is synthesized speech.
-
-
6. An additional information reading apparatus which decodes additional information embedded in inputted speech, said apparatus comprising:
-
a fundamental frequency calculating unit for calculating a speech fundamental frequency of the inputted speech on a per frame basis, each frame having a predetermined duration; a storage unit in which a micro-prosody pattern associated with the additional information is stored, the micro-prosody pattern being a pattern of a fine time structure of prosody including a phoneme boundary; and an additional information extracting unit for; extracting, in a segment having a duration including a phoneme boundary within which a micro-prosody pattern of the inputted speech exists as water mark information, a micro-prosody pattern from the speech fundamental frequency calculated by said fundamental frequency calculating unit; comparing the extracted micro-prosody pattern with the micro-prosody pattern associated with the additional information; and extracting predetermined additional information included in the extracted micro-prosody pattern. - View Dependent Claims (7)
-
-
8. A speech synthesis method of synthesizing speech, comprising
generating prosody information of the speech based on synthesized speech generation information, wherein said generating includes: -
specifying a time position including a phoneme boundary in the speech to be synthesized into which a micro-prosody pattern is to be embedded, based on the synthesized speech generation information; extracting a micro-prosody pattern from a storage unit, the micro-prosody pattern being a pattern of a fine time structure of prosody including the phoneme boundary; and embedding the extracted micro-prosody pattern into the specified time position as watermark information, the embedded micro-prosody pattern indicating that the speech is synthesized speech. - View Dependent Claims (9)
-
-
10. A program embodied on a computer readable recording medium, for making a computer function as a speech synthesis apparatus, said program making the computer function as the following:
-
a prosody generating unit for generating prosody information of speech based on synthesized speech generation information; and a synthesis unit for synthesizing the speech based on the prosody information, wherein the prosody generating unit is for; specifying a time position including a phoneme boundary in the speech to be synthesized into which a micro-prosody pattern is to be embedded, based on the synthesized speech generation information; extracting a micro-prosody pattern from a storage unit, the micro-prosody pattern being a pattern of a fine time structure of prosody including the phoneme boundary; and embedding the extracted micro-prosody pattern into the specified time position as watermark information, the embedded micro-prosody pattern indicating that the speech is synthesized speech. - View Dependent Claims (11)
-
-
12. A computer readable recording medium on which a program for making a computer function as a speech synthesis apparatus is recorded,
wherein said program makes a computer function as the following: -
a prosody generating unit for generating prosody information of speech based on synthesized speech generation information; and a synthesis unit for synthesizing the speech based on the prosody information, wherein the prosody generating unit is for; specifying a time position including a phoneme boundary in the speech to be synthesized into which a micro-prosody pattern is to be embedded, based on the synthesized speech generation information; extracting a micro-prosody pattern from a storage unit, the micro-prosody pattern being a pattern of a fine time structure of prosody including the phoneme boundary; and embedding the extracted micro-prosody pattern into the specified time position as watermark information, the embedded micro-prosody pattern indicating that the speech is synthesized speech. - View Dependent Claims (13)
-
-
16. A speech synthesis apparatus which synthesizes speech, said apparatus comprising:
-
a prosody generating unit for generating prosody information of the speech based on synthesized speech generation information; and a synthesis unit for synthesizing the speech based on the prosody information, wherein said prosody generation unit is for; specifying a time position in the speech to be synthesized into which a micro-prosody pattern is to be embedded, based on the synthesized speech generation information; extracting a micro-prosody pattern from a storage unit, the micro-prosody pattern being a pattern of a fine time structure of prosody including a phoneme boundary; and embedding the extracted micro-prosody pattern into the specified time position as watermark information, the embedded micro-prosody pattern indicating that the speech is synthesized speech, and the embedded micro-prosody pattern being used to identify a manufacturer of said speech synthesis apparatus.
-
-
17. A synthesis speech identifying apparatus which identifies whether or not inputted speech is synthesized speech, said apparatus comprising:
-
a fundamental frequency calculating unit for calculating a speech fundamental frequency of the inputted speech on a per frame basis, each frame having a predetermined duration; a storage unit in which a micro-prosody pattern is stored, the micro-prosody pattern being a pattern of a fine time structure of prosody including a phoneme boundary, and the micro-prosody pattern being used to identify the inputted speech as synthesized speech and to identify a manufacturer of said speech synthesis apparatus that has generated the synthesized speech; and an identifying unit for; extracting, in a segment having a duration within which a micro-prosody pattern of the inputted speech exists as watermark information, the fundamental frequency of the speech calculated by said fundamental frequency calculation unit; matching a pattern of the extracted fundamental frequency with the micro-prosody pattern stored in said storage unit; and identifying whether or not the inputted speech is synthesized speech and, in the case where the inputted speech is synthesized speech, identify a manufacturer of said speech synthesis apparatus that has generated the synthesized speech.
-
-
18. An additional information reading apparatus which decodes additional information embedded in inputted speech, said apparatus comprising:
-
a fundamental frequency calculating unit for calculating a speech fundamental frequency of the inputted speech on a per frame basis, each frame having a predetermined duration; a storage unit in which a micro-prosody pattern associated with the additional information is stored, the micro-prosody pattern being a pattern of a fine time structure of prosody including a phoneme boundary, and the micro-prosody pattern being used to identify a manufacturer of said speech synthesis apparatus; and an additional information extracting unit for; extracting, in a segment having a duration including a phoneme boundary within which a micro-prosody pattern of the inputted speech exists as watermark information, a micro-prosody pattern from the speech fundamental frequency calculated by said fundamental frequency calculating unit; comparing the extracted micro-prosody pattern with the micro-prosody pattern associated with the additional information; extracting predetermined additional information included in the extracted micro-prosody pattern; and identifying a manufacturer of said speech synthesis apparatus that has generated the synthesized speech.
-
Specification