Speech synthesis apparatus

US 7,526,430 B2
Filed: 09/15/2005
Issued: 04/28/2009
Est. Priority Date: 06/04/2004
Status: Active Grant

First Claim

Patent Images

1. A speech synthesis apparatus which synthesizes speech, said apparatus comprising:

a prosody generating unit for generating prosody information of the speech based on synthesized speech generation information; and

a synthesis unit for synthesizing the speech based on the prosody information,wherein said prosody generation unit is for;

specifying a time position including a phoneme boundary in the speech to be synthesized into which a micro-prosody pattern is to be embedded, based on the synthesized speech generation information;

extracting a micro-prosody pattern from a storage unit, the micro-prosody pattern being a pattern of a fine time structure of prosody including the phoneme boundary; and

embedding the extracted micro-prosody pattern into the specified time position as watermark information, the embedded micro-prosody pattern indicating that the speech is synthesized speech.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech synthesis apparatus, which can embed unchangeable additional information into synthesized speech without causing a deterioration of speech quality and restriction by bands, includes a language processing unit which generates synthesized speech generation information necessary for generating synthesized speech in accordance with a language string, a prosody generating unit which generates prosody information of speech based on the synthesized speech generation information, and a waveform generating unit which synthesizes speech based on the prosody information, in which the prosody generating unit embed code information as watermark information in the prosody information of a segment having a predetermined time duration within a phoneme length including a phoneme boundary.

Citations

18 Claims

1. A speech synthesis apparatus which synthesizes speech, said apparatus comprising:
- a prosody generating unit for generating prosody information of the speech based on synthesized speech generation information; and
  
  a synthesis unit for synthesizing the speech based on the prosody information,wherein said prosody generation unit is for;
  
  specifying a time position including a phoneme boundary in the speech to be synthesized into which a micro-prosody pattern is to be embedded, based on the synthesized speech generation information;
  
  extracting a micro-prosody pattern from a storage unit, the micro-prosody pattern being a pattern of a fine time structure of prosody including the phoneme boundary; and
  
  embedding the extracted micro-prosody pattern into the specified time position as watermark information, the embedded micro-prosody pattern indicating that the speech is synthesized speech.
- View Dependent Claims (2, 3, 4, 14, 15)
- - 2. The speech synthesis apparatus according to claim 1,wherein a duration for embedding the extracted micro-prosody pattern is a duration in a range from 10 milliseconds to 50 milliseconds.
  - 3. The speech synthesis apparatus according to claim 1, further comprisingan encoding unit for encoding additional information,wherein said encoding unit is for encoding information for associating the micro-prosody pattern stored in said storage unit with the additional information, andwherein said prosody generation unit is for selecting from the storage unit, based on the encoded information, the micro-prosody pattern associated with the additional information, and embedding the selected micro-prosody pattern into the specified time position including the phoneme boundary.
  - 4. The speech synthesis apparatus according to claim 3,wherein said encoding unit is further for generating key information which corresponds to the encoded information for decoding the additional information.
  - 14. The speech synthesis apparatus according to claim 1,wherein said prosody generating unit is for identifying, as the time position including the phoneme boundary in the speech to be synthesized, a portion of at least one vowel of:
    - a vowel which follows immediately after silence;
      
      a vowel which follows immediately after a consonant other than a semivowel;
      
      a vowel which immediately precedes silence; and
      
      a vowel which immediately precedes a consonant other than a semivowel.
  - 15. The speech synthesis apparatus according to claim 1,wherein said prosody generating unit is for identifying, as the time position including the phoneme boundary in the speech to be synthesized, at least one of:
    - a portion, including a starting point of a phoneme, of a vowel which follows immediately after silence;
      
      a portion, -including the starting point of the phoneme, of a vowel which follows immediately after a consonant other than a semivowel;
      
      a portion, including an ending point of the phoneme, of a vowel which immediately precedes silence; and
      
      a portion, including the ending point of the phoneme, of a vowel which immediately precedes a consonant other than a semivowel.

5. A synthesis speech identifying apparatus which identifies whether or not inputted speech is synthesized speech, said apparatus comprising:
- a fundamental frequency calculating unit for calculating a speech fundamental frequency of the inputted speech on a per frame basis, each frame having a predetermined duration;
  
  a storage unit in which a micro-prosody pattern is stored, the micro-prosody pattern being a pattern of a fine time structure of prosody including a phoneme boundary, and being used to identify the inputted speech as synthesized speech; and
  
  an identifying unit for;
  
  extracting, in a segment having a duration including a phoneme boundary within which a micro-prosody pattern of the inputted speech exists as watermark information, the fundamental frequency of the speech calculated by said fundamental frequency calculation unit;
  
  matching a pattern of the extracted fundamental frequency with the micro-prosody pattern stored in said storage unit; and
  
  identifying whether or not the inputted speech is synthesized speech.

6. An additional information reading apparatus which decodes additional information embedded in inputted speech, said apparatus comprising:
- a fundamental frequency calculating unit for calculating a speech fundamental frequency of the inputted speech on a per frame basis, each frame having a predetermined duration;
  
  a storage unit in which a micro-prosody pattern associated with the additional information is stored, the micro-prosody pattern being a pattern of a fine time structure of prosody including a phoneme boundary; and
  
  an additional information extracting unit for;
  
  extracting, in a segment having a duration including a phoneme boundary within which a micro-prosody pattern of the inputted speech exists as water mark information, a micro-prosody pattern from the speech fundamental frequency calculated by said fundamental frequency calculating unit;
  
  comparing the extracted micro-prosody pattern with the micro-prosody pattern associated with the additional information; and
  
  extracting predetermined additional information included in the extracted micro-prosody pattern.
- View Dependent Claims (7)
- - 7. The additional information reading apparatus according to claim 6,wherein the additional information is encoded, andsaid additional information reading apparatus further comprisesa decoding unit for decoding the encoded additional information using key information for decoding.

8. A speech synthesis method of synthesizing speech, comprisinggenerating prosody information of the speech based on synthesized speech generation information,wherein said generating includes:
- specifying a time position including a phoneme boundary in the speech to be synthesized into which a micro-prosody pattern is to be embedded, based on the synthesized speech generation information;
  
  extracting a micro-prosody pattern from a storage unit, the micro-prosody pattern being a pattern of a fine time structure of prosody including the phoneme boundary; and
  
  embedding the extracted micro-prosody pattern into the specified time position as watermark information, the embedded micro-prosody pattern indicating that the speech is synthesized speech.
- View Dependent Claims (9)
- - 9. The speech synthesis method according to claim 8,wherein a duration for embedding the extracted micro-prosody pattern is a duration in a range from 10 milliseconds to 50 milliseconds.

10. A program embodied on a computer readable recording medium, for making a computer function as a speech synthesis apparatus, said program making the computer function as the following:
- a prosody generating unit for generating prosody information of speech based on synthesized speech generation information; and
  
  a synthesis unit for synthesizing the speech based on the prosody information,wherein the prosody generating unit is for;
  
  specifying a time position including a phoneme boundary in the speech to be synthesized into which a micro-prosody pattern is to be embedded, based on the synthesized speech generation information;
  
  extracting a micro-prosody pattern from a storage unit, the micro-prosody pattern being a pattern of a fine time structure of prosody including the phoneme boundary; and
  
  embedding the extracted micro-prosody pattern into the specified time position as watermark information, the embedded micro-prosody pattern indicating that the speech is synthesized speech.
- View Dependent Claims (11)
- - 11. The program embodied on a computer readable recording medium, according to claim 10,wherein a duration for embedding the extracted micro-prosody pattern is a duration in a range from 10 milliseconds to 50 milliseconds.

12. A computer readable recording medium on which a program for making a computer function as a speech synthesis apparatus is recorded,wherein said program makes a computer function as the following:
- a prosody generating unit for generating prosody information of speech based on synthesized speech generation information; and
  
  a synthesis unit for synthesizing the speech based on the prosody information,wherein the prosody generating unit is for;
  
  specifying a time position including a phoneme boundary in the speech to be synthesized into which a micro-prosody pattern is to be embedded, based on the synthesized speech generation information;
  
  extracting a micro-prosody pattern from a storage unit, the micro-prosody pattern being a pattern of a fine time structure of prosody including the phoneme boundary; and
  
  embedding the extracted micro-prosody pattern into the specified time position as watermark information, the embedded micro-prosody pattern indicating that the speech is synthesized speech.
- View Dependent Claims (13)
- - 13. The computer readable recording medium according to claim 12,wherein a duration for embedding the extracted micro-prosody pattern is a duration in a range from 10 milliseconds to 50 milliseconds.

16. A speech synthesis apparatus which synthesizes speech, said apparatus comprising:
- a prosody generating unit for generating prosody information of the speech based on synthesized speech generation information; and
  
  a synthesis unit for synthesizing the speech based on the prosody information,wherein said prosody generation unit is for;
  
  specifying a time position in the speech to be synthesized into which a micro-prosody pattern is to be embedded, based on the synthesized speech generation information;
  
  extracting a micro-prosody pattern from a storage unit, the micro-prosody pattern being a pattern of a fine time structure of prosody including a phoneme boundary; and
  
  embedding the extracted micro-prosody pattern into the specified time position as watermark information, the embedded micro-prosody pattern indicating that the speech is synthesized speech, and the embedded micro-prosody pattern being used to identify a manufacturer of said speech synthesis apparatus.

17. A synthesis speech identifying apparatus which identifies whether or not inputted speech is synthesized speech, said apparatus comprising:
- a fundamental frequency calculating unit for calculating a speech fundamental frequency of the inputted speech on a per frame basis, each frame having a predetermined duration;
  
  a storage unit in which a micro-prosody pattern is stored, the micro-prosody pattern being a pattern of a fine time structure of prosody including a phoneme boundary, and the micro-prosody pattern being used to identify the inputted speech as synthesized speech and to identify a manufacturer of said speech synthesis apparatus that has generated the synthesized speech; and
  
  an identifying unit for;
  
  extracting, in a segment having a duration within which a micro-prosody pattern of the inputted speech exists as watermark information, the fundamental frequency of the speech calculated by said fundamental frequency calculation unit;
  
  matching a pattern of the extracted fundamental frequency with the micro-prosody pattern stored in said storage unit; and
  
  identifying whether or not the inputted speech is synthesized speech and, in the case where the inputted speech is synthesized speech, identify a manufacturer of said speech synthesis apparatus that has generated the synthesized speech.

18. An additional information reading apparatus which decodes additional information embedded in inputted speech, said apparatus comprising:
- a fundamental frequency calculating unit for calculating a speech fundamental frequency of the inputted speech on a per frame basis, each frame having a predetermined duration;
  
  a storage unit in which a micro-prosody pattern associated with the additional information is stored, the micro-prosody pattern being a pattern of a fine time structure of prosody including a phoneme boundary, and the micro-prosody pattern being used to identify a manufacturer of said speech synthesis apparatus; and
  
  an additional information extracting unit for;
  
  extracting, in a segment having a duration including a phoneme boundary within which a micro-prosody pattern of the inputted speech exists as watermark information, a micro-prosody pattern from the speech fundamental frequency calculated by said fundamental frequency calculating unit;
  
  comparing the extracted micro-prosody pattern with the micro-prosody pattern associated with the additional information;
  
  extracting predetermined additional information included in the extracted micro-prosody pattern; and
  
  identifying a manufacturer of said speech synthesis apparatus that has generated the synthesized speech.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Panasonic Intellectual Property Corporation of America (Panasonic Holdings Corporation)
Original Assignee
Panasonic Corporation (Panasonic Holdings Corporation)
Inventors
Kamai, Takahiro, Kato, Yumiko
Primary Examiner(s)
Chawan; Vijay B
Assistant Examiner(s)
Colucci; Michael C

Application Number

US11/226,331
Publication Number

US 20060009977A1
Time in Patent Office

1,321 Days
Field of Search

704/260, 704/275, 704/268, 704/273, 704/258, 700/83, 706/21, 715/716, 381/13
US Class Current

704/267
CPC Class Codes

G10L 13/10 Prosody rules derived from ...

Speech synthesis apparatus

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Speech synthesis apparatus

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links