Speech synthesis apparatus

US 20060009977A1
Filed: 09/15/2005
Published: 01/12/2006
Est. Priority Date: 06/04/2004
Status: Active Grant

First Claim

Patent Images

1-13. -13. (canceled)

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech synthesis apparatus, which can embed unchangeable additional information into synthesized speech without causing a deterioration of speech quality and restriction by bands, includes a language processing unit which generates synthesized speech generation information necessary for generating synthesized speech in accordance with a language string, a prosody generating unit which generates prosody information of speech based on the synthesized speech generation information, and a waveform generating unit which synthesizes speech based on the prosody information, in which the prosody generating unit embed code information as watermark information in the prosody information of a segment having a predetermined time duration within a phoneme length including a phoneme boundary.

Citations

25 Claims

1-13. -13. (canceled)

14. A speech synthesis apparatus which synthesizes speech, comprising:
- a prosody generating unit operable to generate prosody information of the speech based on synthesized speech generation information; and
  
  a synthesis unit operable to synthesize the speech based on the prosody information, wherein said prosody generating unit is operable to embed code information as watermark information into the prosody information of a segment having a predetermined duration within a phoneme length including a phoneme boundary, wherein the segment is one of the following;
  
  a segment having the predetermined duration from a start point of voiced sound immediately preceded by voiceless sound;
  
  a segment having the predetermined duration until an end of voiced sound immediately followed by voiceless sound;
  
  a segment having the predetermined duration from a start point of voiced sound immediately preceded by silence;
  
  a segment having the predetermined duration until an end of voiced sound immediately followed by silence;
  
  a segment having the predetermined duration from a start point of a vowel immediately preceded by a consonant;
  
  a segment having the predetermined duration until an end of a vowel immediately followed by a consonant;
  
  a segment having the predetermined duration from a start point of a vowel immediately preceded by silence; and
  
  a segment having the predetermined duration until an end of a vowel immediately followed by silence.
- View Dependent Claims (15, 16, 17, 18, 19)
- - 15. The speech synthesis apparatus according to claim 14, wherein the predetermined duration is a duration in a range from 10 milliseconds or more to 50 milliseconds or less.
  - 16. The speech synthesis apparatus according to claim 14, wherein the code information is identification information for identifying whether or not inputted speech is synthesized speech.
  - 17. The speech synthesis apparatus according to claim 14, further comprising an encoding unit operable to encode predetermined information, wherein the code information is encoded information, and the code information is decoded using key information.
  - 18. The speech synthesis apparatus according to claim 17, wherein said encoding unit is further operable to generate the key information.
  - 19. The speech synthesis apparatus according to claim 14, wherein the code information is indicated by micro-prosody.

20. A synthesis speech identifying apparatus which identifies whether or not inputted speech is synthesized speech, said apparatus comprising:
- a fundamental frequency calculating unit operable to calculate a speech fundamental frequency of the inputted speech on a per frame basis, each frame having a predetermined duration; and
  
  an identifying unit operable to identify, in a segment having a predetermined duration within a phoneme length including a phoneme boundary, whether or not the inputted speech is the synthesized speech by identifying whether or not identification information is included in the speech fundamental frequencies calculated by said fundamental frequency calculating unit, the identification information being for identifying whether or not the inputted speech is the synthesized speech.

21. An additional information reading apparatus which decodes additional information embedded in inputted speech, comprising:
- a fundamental frequency calculating unit operable to calculate a speech fundamental frequency of the inputted speech on a per frame basis, each frame having a predetermined duration; and
  
  an additional information extracting unit operable to extract, in a segment having a predetermined duration within a phoneme length including a phoneme boundary, predetermined additional information indicated by a frequency string from the speech fundamental frequencies calculated by said fundamental frequency calculating unit.
- View Dependent Claims (22)
- - 22. The additional information reading apparatus according to claim 21, wherein the additional information is encoded, and said additional information reading apparatus further comprises a decoding unit operable to decode the encoded additional information using key information for decoding.

23. A speech synthesis method of synthesizing speech, comprising generating speech prosody information based on synthesized speech generation information, wherein in said generating, code information is embedded as watermark information in the prosody information of a segment having a predetermined duration within a phoneme length including a phoneme boundary, wherein the segment is one of the following:
- a segment having the predetermined duration from a start point of voiced sound immediately preceded by voiceless sound;
  
  a segment having the predetermined duration until an end of voiced sound immediately followed by voiceless sound;
  
  a segment having the predetermined duration from a start point of voiced sound immediately preceded by silence;
  
  a segment having the predetermined duration until an end of voiced sound immediately followed by silence;
  
  a segment having the predetermined duration from a start point of a vowel immediately preceded by a consonant;
  
  a segment having the predetermined duration until an end of a vowel immediately followed by a consonant;
  
  a segment having the predetermined duration from a start point of a vowel immediately preceded by silence; and
  
  a segment having the predetermined duration until an end of a vowel immediately followed by silence.

24. A program for making a computer function as a speech synthesis apparatus, said program making the computer function as the following:
- a prosody generating unit operable to generate prosody information of speech based on synthesized speech generation information; and
  
  a synthesis unit operable to synthesize the speech based on the prosody information, wherein the prosody generating unit is operable to embed code information as watermark information in the prosody information of a segment having a predetermined duration within a phoneme length including a phoneme boundary, wherein the segment is one of the following;
  
  a segment having the predetermined duration from a start point of voiced sound immediately preceded by voiceless sound;
  
  a segment having the predetermined duration until an end of voiced sound immediately followed by voiceless sound;
  
  a segment having the predetermined duration from a start point of voiced sound immediately preceded by silence;
  
  a segment having the predetermined duration until an end of voiced sound immediately followed by silence;
  
  a segment having the predetermined duration from a start point of a vowel immediately preceded by a consonant;
  
  a segment having the predetermined duration until an end of a vowel immediately followed by a consonant;
  
  a segment having the predetermined duration from a start point of a vowel immediately preceded by silence; and
  
  a segment having the predetermined duration until an end of a vowel immediately followed by silence.

25. A computer readable recording medium on which a program for making a computer function as a speech synthesis apparatus is recorded, wherein said program makes a computer function as the following:
- a prosody generating unit operable to generate prosody information of speech based on synthesized speech generation information; and
  
  a synthesis unit operable to synthesize the speech based on the prosody information, wherein the prosody generating unit is operable to embed embedding code information as watermark information in the prosody information of a segment having a predetermined duration within a phoneme length including a phoneme boundary, wherein the segment is one of the following;
  
  a segment having the predetermined duration from a start point of voiced sound immediately preceded by voiceless sound;
  
  a segment having the predetermined duration until an end of voiced sound immediately followed by voiceless sound;
  
  a segment having the predetermined duration from a start point of voiced sound immediately preceded by silence;
  
  a segment having the predetermined duration until an end of voiced sound immediately followed by silence;
  
  a segment having the predetermined duration from a start point of a vowel immediately preceded by a consonant;
  
  a segment having the predetermined duration until an end of a vowel immediately followed by a consonant;
  
  a segment having the predetermined duration from a start point of a vowel immediately preceded by silence; and
  
  a segment having the predetermined duration until an end of a vowel immediately followed by silence.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Panasonic Intellectual Property Corporation of America (Panasonic Holdings Corporation)
Original Assignee
Panasonic Corporation (Panasonic Holdings Corporation)
Inventors
Kamai, Takahiro, Kato, Yumiko

Granted Patent

US 7,526,430 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/260
CPC Class Codes

G10L 13/10 Prosody rules derived from ...

Speech synthesis apparatus

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Speech synthesis apparatus

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links