Speech synthesis method and apparatus, program, recording medium and robot apparatus

US 7,062,438 B2
Filed: 03/13/2003
Issued: 06/13/2006
Est. Priority Date: 03/15/2002
Status: Expired due to Fees

First Claim

Patent Images

1. A speech synthesis method comprising:

a separating step of separating, from an input text, a singing data portion specified by a singing tag and a text portion;

a singing metrical data forming step of forming singing metrical data from said singing data, said singing metrical data expresses parameters of a lyric;

a speech symbol sequence forming step of forming a speech symbol sequence for said text portion;

a metrical data forming step of forming metrical data from said speech symbol sequence, said metrical data expresses parameters of a speech signal sequence;

a natural metrical data selecting step of analyzing said text portion and selecting, if preset words or sentences exist in said text portion, natural metrical data associated with said preset words or sentences, extracted in advance from the uttered speech of a human being, from a storage means; and

a speech synthesis step of synthesizing the speech based on said singing metrical data, said natural metrical data or said metrical data;

wherein said speech symbol sequence is formed for the section of the text portion other than said preset words or sentences,wherein the natural metrical data includes data about a pitch period, a pitch duration, and a pitch volume registered in a natural metrical dictionary stored in the storage means.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A sentence or a singing is to be synthesized with a natural speech close to the human voice. To this end, singing metrical data are formed in a tag processing unit 211 in a singing synthesis unit 212 in a speech synthesis apparatus 200 based on singing data and an analyzed text portion. A language analysis unit 213 performs language processing on text portions other than the singing data. As for a text portion registered in a natural metrical dictionary, as determined by this language processing, corresponding natural metrical data is selected and its parameters are adjusted in a metrical data adjustment unit 222 based on phonemic segment data of a phonemic segment storage unit 223 in the metrical data adjustment unit 222. As for a text portion not registered in the natural metrical dictionary, a phonemic symbol string is generated in a natural metrical dictionary storage unit 214, after which metrical data are generated in a metrical generating unit 221. A waveform generating unit 224 concatenates necessary phonemic segment data, based on the natural metrical data, metrical data and the singing metrical data to generate speech waveform data.

176 Citations

24 Claims

1. A speech synthesis method comprising:
- a separating step of separating, from an input text, a singing data portion specified by a singing tag and a text portion;
  
  a singing metrical data forming step of forming singing metrical data from said singing data, said singing metrical data expresses parameters of a lyric;
  
  a speech symbol sequence forming step of forming a speech symbol sequence for said text portion;
  
  a metrical data forming step of forming metrical data from said speech symbol sequence, said metrical data expresses parameters of a speech signal sequence;
  
  a natural metrical data selecting step of analyzing said text portion and selecting, if preset words or sentences exist in said text portion, natural metrical data associated with said preset words or sentences, extracted in advance from the uttered speech of a human being, from a storage means; and
  
  a speech synthesis step of synthesizing the speech based on said singing metrical data, said natural metrical data or said metrical data;
  
  wherein said speech symbol sequence is formed for the section of the text portion other than said preset words or sentences,wherein the natural metrical data includes data about a pitch period, a pitch duration, and a pitch volume registered in a natural metrical dictionary stored in the storage means.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The speech synthesis method according to claim 1 wherein at least the pitch and the duration of each sound note, the lyric accorded to each sound note, rest, tempo and loudness of said singing data are specified by tags.
  - 3. The speech synthesis method according to claim 1 wherein, in said singing metrical data forming step, the vibrato is applied by changing the pitch period and the duration of each phoneme in said singing metrical data.
  - 4. The speech synthesis method according to claim 3 wherein, in said singing metrical data forming step, the vibrato is applied to a phoneme longer than a preset duration.
  - 5. The speech synthesis method according to claim 3 wherein, in said singing metrical data forming step, the vibrato is applied to the phonemes of the portion of the singing data specified by a tag.
  - 6. The speech synthesis method according to claim 1 further comprising:
    - a parameter adjusting step of adjusting the pitch of respective phonemes in said singing metrical data.

7. A speech synthesis apparatus comprising:
- separating means for separating, from an input text, a singing data portion specified by a singing tag and a text portion;
  
  singing metrical data forming means for forming singing metrical data from said singing data, said singing metrical data expresses parameters of a lyric;
  
  speech symbol sequence forming means for forming a speech symbol sequence for said text portion;
  
  metrical data forming means for forming metrical data from said speech symbol sequence, said metrical data expresses parameters of a speech signal sequence;
  
  storage means having pre-stored therein preset words or sentences and natural metrical data corresponding to said preset words or sentences extracted from the utterance of a human being;
  
  natural metrical data selecting means for analyzing said text portion and selecting, if preset words or sentences exist in said text portion, natural metrical data associated with said preset words or sentences, extracted in advance from the uttered speech of the human being, from said storage means; and
  
  speech synthesis means for synthesizing the speech based on said singing metrical data, said natural metrical data or said metrical data;
  
  wherein said speech symbol sequence is formed for the section of the text portion other than said preset words or sentences,wherein the natural metrical data includes data about a pitch period, a pitch duration, and a pitch volume registered in a natural metrical dictionary stored in the storage means.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The speech synthesis apparatus according to claim 7 wherein at least the pitch and the duration of each sound note, the lyric accorded to each sound note, rest, tempo and loudness of said singing data are specified by tags.
  - 9. The speech synthesis apparatus according to claim 7 wherein, in said singing metrical data forming means, the vibrato is applied by changing the pitch period and the duration of each phoneme in said singing metrical data.
  - 10. The speech synthesis apparatus according to claim 9 wherein, in said singing metrical data forming means, the vibrato is applied to the phoneme longer than a preset duration.
  - 11. The speech synthesis apparatus according to claim 10 wherein, in said singing metrical data forming means, the vibrato is applied to a phoneme of the portion of the singing data specified by a tag.
  - 12. The speech synthesis apparatus according to claim 7 further comprising:
    - parameter adjusting means for adjusting the pitch of the respective phonemes in said singing metrical data.

13. A computer-readable recording medium having recorded thereon a program for having a computer execute preset processing, said program comprising:
- a separating step of separating, â
  
  om an input text, a singing data portion specified by a singing tag and a text portion;
  
  a singing metrical data forming step of forming singing metrical data from said singing data, said singing metrical data expresses parameters of a lyric;
  
  a speech symbol sequence forming step of forming a speech symbol sequence for said text portion;
  
  a metrical data forming step of forming metrical data from said speech symbol sequence, said metrical data expresses parameters of a speech signal sequence;
  
  a natural metrical data selecting step of analyzing said text portion and selecting, if preset words or sentences exist in said text portion, natural metrical data associated with said preset words or sentences, extracted in advance from the uttered speech of a human being, from storage means; and
  
  a speech synthesis step of synthesizing the speech based on said singing metrical data, said natural metrical data or said metrical data;
  
  wherein said speech symbol sequence is formed for the section of the text portion other than said preset words or sentences,wherein the natural metrical data includes data about a pitch period, a pitch duration, and a pitch volume registered in a natural metrical dictionary stored in the storage means.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The recording medium according to claim 13 wherein at least the pitch and the duration of each sound note, the lyric accorded to each sound note, rest, tempo and loudness of said singing data are specified by tags.
  - 15. The recording medium according to claim 13 wherein, in said singing metrical data forming step, the vibrato is applied by changing the pitch period and the duration of each phoneme in said singing metrical data.
  - 16. The recording medium according to claim 15 wherein, in said singing metrical data forming step, the vibrato is applied to a phoneme longer than a preset duration.
  - 17. The recording medium according to claim 15 wherein, in said singing metrical data forming step, the vibrato is applied to a phoneme of the portion of the singing data specified by a tag.
  - 18. The recording medium according to claim 13 wherein said program further comprising:
    - a parameter adjusting step of adjusting the pitch of the respective phonemes in said singing metrical data.

19. An autonomous robot apparatus for performing a behavior based on the input information supplied thereto, comprising:
- separating means for separating, from an input text, a singing data portion specified by a singing tag, and a text portion;
  
  singing metrical data forming means for forming singing metrical data from said singing data, said singing metrical data expresses parameters of a lyric;
  
  speech symbol sequence forming means for forming a speech symbol sequence for said text portion;
  
  metrical data forming means for forming metrical data from said speech symbol sequence, said metrical data expresses parameters of a speech signal sequence;
  
  storage means for storing preset words or sentences and natural metrical data corresponding to said preset words or sentences extracted in advance from the utterance of a human being;
  
  natural metrical data selecting means for analyzing said text portion and selecting, if preset words or sentences exist in said text portion, natural metrical data associated with said preset words or sentences extracted in advance from the uttered speech of the human being, from storage means; and
  
  speech synthesis means for synthesizing the speech based on said singing metrical data, said natural metrical data or said metrical data;
  
  wherein said speech symbol sequence is formed for the section of the text portion other than said preset words or sentences,wherein the natural metrical data includes data about a pitch period, a pitch duration, and a pitch volume registered in a natural metrical dictionary stored in the storage means.
- View Dependent Claims (20, 21, 22, 23, 24)
- - 20. The robot apparatus according to claim 19 wherein at least the pitch and the duration of each sound note, the lyric accorded to each sound note, rest, tempo and loudness of said singing data are specified by tags.
  - 21. The robot apparatus according to claim 19 wherein, in said singing metrical data forming means, the vibrato is applied by changing the pitch period and the duration of each phoneme in said singing metrical data.
  - 22. The robot apparatus according to claim 21 wherein, in said singing metrical data forming means, the vibrato is applied to a phoneme longer than a preset duration.
  - 23. The robot apparatus according to claim 22 wherein, in said singing metrical data forming means, the vibrato is applied to a phoneme of the portion of the singing data specified by a tag.
  - 24. The robot apparatus according to claim 19 further comprising:
    - a parameter adjusting means of adjusting the pitch of the respective phonemes in said singing metrical data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sony Corporation (Sony Group Corp.)
Original Assignee
Sony Corporation (Sony Group Corp.)
Inventors
Akabane, Makoto, Kobayashi, Kenichiro, Yamazaki, Nobuhide
Primary Examiner(s)
Young, Wayne
Assistant Examiner(s)
Vo, Huyen X.

Application Number

US10/388,107
Publication Number

US 20040019485A1
Time in Patent Office

1,188 Days
Field of Search

704/270, 704/260, 704/258, 704/267, 704/205, 84609-610
US Class Current

704/260
CPC Class Codes

G10L 13/033 Voice editing, e.g. manipul...

Speech synthesis method and apparatus, program, recording medium and robot apparatus

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

176 Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

Speech synthesis method and apparatus, program, recording medium and robot apparatus

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

176 Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links