Speech synthesis apparatus
3 Assignments
0 Petitions
Accused Products
Abstract
A speech synthesis apparatus, which can embed unchangeable additional information into synthesized speech without causing a deterioration of speech quality and restriction by bands, includes a language processing unit which generates synthesized speech generation information necessary for generating synthesized speech in accordance with a language string, a prosody generating unit which generates prosody information of speech based on the synthesized speech generation information, and a waveform generating unit which synthesizes speech based on the prosody information, in which the prosody generating unit embed code information as watermark information in the prosody information of a segment having a predetermined time duration within a phoneme length including a phoneme boundary.
-
Citations
25 Claims
-
1-13. -13. (canceled)
-
14. A speech synthesis apparatus which synthesizes speech, comprising:
-
a prosody generating unit operable to generate prosody information of the speech based on synthesized speech generation information; and
a synthesis unit operable to synthesize the speech based on the prosody information, wherein said prosody generating unit is operable to embed code information as watermark information into the prosody information of a segment having a predetermined duration within a phoneme length including a phoneme boundary, wherein the segment is one of the following;
a segment having the predetermined duration from a start point of voiced sound immediately preceded by voiceless sound;
a segment having the predetermined duration until an end of voiced sound immediately followed by voiceless sound;
a segment having the predetermined duration from a start point of voiced sound immediately preceded by silence;
a segment having the predetermined duration until an end of voiced sound immediately followed by silence;
a segment having the predetermined duration from a start point of a vowel immediately preceded by a consonant;
a segment having the predetermined duration until an end of a vowel immediately followed by a consonant;
a segment having the predetermined duration from a start point of a vowel immediately preceded by silence; and
a segment having the predetermined duration until an end of a vowel immediately followed by silence. - View Dependent Claims (15, 16, 17, 18, 19)
-
-
20. A synthesis speech identifying apparatus which identifies whether or not inputted speech is synthesized speech, said apparatus comprising:
-
a fundamental frequency calculating unit operable to calculate a speech fundamental frequency of the inputted speech on a per frame basis, each frame having a predetermined duration; and
an identifying unit operable to identify, in a segment having a predetermined duration within a phoneme length including a phoneme boundary, whether or not the inputted speech is the synthesized speech by identifying whether or not identification information is included in the speech fundamental frequencies calculated by said fundamental frequency calculating unit, the identification information being for identifying whether or not the inputted speech is the synthesized speech.
-
-
21. An additional information reading apparatus which decodes additional information embedded in inputted speech, comprising:
-
a fundamental frequency calculating unit operable to calculate a speech fundamental frequency of the inputted speech on a per frame basis, each frame having a predetermined duration; and
an additional information extracting unit operable to extract, in a segment having a predetermined duration within a phoneme length including a phoneme boundary, predetermined additional information indicated by a frequency string from the speech fundamental frequencies calculated by said fundamental frequency calculating unit. - View Dependent Claims (22)
-
-
23. A speech synthesis method of synthesizing speech, comprising
generating speech prosody information based on synthesized speech generation information, wherein in said generating, code information is embedded as watermark information in the prosody information of a segment having a predetermined duration within a phoneme length including a phoneme boundary, wherein the segment is one of the following: - a segment having the predetermined duration from a start point of voiced sound immediately preceded by voiceless sound;
a segment having the predetermined duration until an end of voiced sound immediately followed by voiceless sound;
a segment having the predetermined duration from a start point of voiced sound immediately preceded by silence;
a segment having the predetermined duration until an end of voiced sound immediately followed by silence;
a segment having the predetermined duration from a start point of a vowel immediately preceded by a consonant;
a segment having the predetermined duration until an end of a vowel immediately followed by a consonant;
a segment having the predetermined duration from a start point of a vowel immediately preceded by silence; and
a segment having the predetermined duration until an end of a vowel immediately followed by silence.
- a segment having the predetermined duration from a start point of voiced sound immediately preceded by voiceless sound;
-
24. A program for making a computer function as a speech synthesis apparatus, said program making the computer function as the following:
-
a prosody generating unit operable to generate prosody information of speech based on synthesized speech generation information; and
a synthesis unit operable to synthesize the speech based on the prosody information, wherein the prosody generating unit is operable to embed code information as watermark information in the prosody information of a segment having a predetermined duration within a phoneme length including a phoneme boundary, wherein the segment is one of the following;
a segment having the predetermined duration from a start point of voiced sound immediately preceded by voiceless sound;
a segment having the predetermined duration until an end of voiced sound immediately followed by voiceless sound;
a segment having the predetermined duration from a start point of voiced sound immediately preceded by silence;
a segment having the predetermined duration until an end of voiced sound immediately followed by silence;
a segment having the predetermined duration from a start point of a vowel immediately preceded by a consonant;
a segment having the predetermined duration until an end of a vowel immediately followed by a consonant;
a segment having the predetermined duration from a start point of a vowel immediately preceded by silence; and
a segment having the predetermined duration until an end of a vowel immediately followed by silence.
-
-
25. A computer readable recording medium on which a program for making a computer function as a speech synthesis apparatus is recorded,
wherein said program makes a computer function as the following: -
a prosody generating unit operable to generate prosody information of speech based on synthesized speech generation information; and
a synthesis unit operable to synthesize the speech based on the prosody information, wherein the prosody generating unit is operable to embed embedding code information as watermark information in the prosody information of a segment having a predetermined duration within a phoneme length including a phoneme boundary, wherein the segment is one of the following;
a segment having the predetermined duration from a start point of voiced sound immediately preceded by voiceless sound;
a segment having the predetermined duration until an end of voiced sound immediately followed by voiceless sound;
a segment having the predetermined duration from a start point of voiced sound immediately preceded by silence;
a segment having the predetermined duration until an end of voiced sound immediately followed by silence;
a segment having the predetermined duration from a start point of a vowel immediately preceded by a consonant;
a segment having the predetermined duration until an end of a vowel immediately followed by a consonant;
a segment having the predetermined duration from a start point of a vowel immediately preceded by silence; and
a segment having the predetermined duration until an end of a vowel immediately followed by silence.
-
Specification