Method of controlling high-speed reading in a text-to-speech conversion system

US 7,240,005 B2
Filed: 01/29/2002
Issued: 07/03/2007
Est. Priority Date: 06/26/2001
Status: Active Grant

First Claim

Patent Images

1. A method of controlling highspeed reading in a text-to-speech conversion system including a text analysis module for generating a phoneme and prosody character string from an input text;

a prosody generation module for generating a synthesis parameter of at least a voice segment, a phoneme duration, and a fundamental frequency for the phoneme and prosody character string;

a voice segment dictionary in which voice segments as a source of voice are registered; and

a speech generation module for generating a synthetic waveform by waveform superimposition by referring to said voice segment dictionary,said method comprising the step of providing said prosody generation module with a sound quality coefficient determination unit that has a sound quality conversion coefficient table for changing said voice segment to switch sound quality and selects from said sound quality conversion coefficient table such a coefficient that sound quality does not change when a user-designated utterance speed exceeds a threshold.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of high-speed reading in a text-to-speech conversion system including a text analysis module (101) for generating a phoneme and prosody character string from an input text; a prosody generation module (102) for generating a synthesis parameter of at least a voice segment, a phoneme duration, and a fundamental frequency for the phoneme and prosody character string; and a speech generation module (103) for generating a synthetic waveform by waveform superimposition by referring to a voice segment dictionary (105). The prosody generation module is provided with both a duration rule table containing empirically found phoneme durations and a duration prediction table containing phoneme durations predicted by statistical analysis and, when the user-designated utterance speed exceeds a threshold, uses the duration rule table and, when the threshold is not exceeded, uses the duration prediction table to determined the phoneme duration.

29 Citations

View as Search Results

16 Claims

1. A method of controlling highspeed reading in a text-to-speech conversion system including a text analysis module for generating a phoneme and prosody character string from an input text;
- a prosody generation module for generating a synthesis parameter of at least a voice segment, a phoneme duration, and a fundamental frequency for the phoneme and prosody character string;
  
  a voice segment dictionary in which voice segments as a source of voice are registered; and
  
  a speech generation module for generating a synthetic waveform by waveform superimposition by referring to said voice segment dictionary,said method comprising the step of providing said prosody generation module with a sound quality coefficient determination unit that has a sound quality conversion coefficient table for changing said voice segment to switch sound quality and selects from said sound quality conversion coefficient table such a coefficient that sound quality does not change when a user-designated utterance speed exceeds a threshold.
- View Dependent Claims (2)
- - 2. The method according to claim 1, wherein said threshold is a predetermined maximum utterance speed.

3. A method of controlling high-speed reading in a text-to-speech conversion system including a text analysis module for generating a phoneme and prosody character string from an input text;
- a prosody generation module for generating a synthesis parameter of at least a voice segment, phoneme duration, and fundamental frequency for the phoneme and prosody character string;
  
  a voice segment dictionary in which voice segments as a source of voice are registered; and
  
  a speech generation module for generating a synthetic waveform by waveform superimposition by referring to said voice segment dictionary,said method comprising the step of providing said prosody generation module with both a pitch contour correction unit for outputting a pitch contour corrected according to an intonation level designated by the user and a switch for determining whether a base pitch is added to said pitch contour corrected according to said user-designated utterance speed, said switch being controlled not to change the base pitch when the utterance speed exceeds a threshold.
- View Dependent Claims (4, 5, 7)
- - 4. The method according to claim 3, wherein said threshold is a predetermined maximum utterance speed.
  - 5. The method according to claim 3, wherein said pitch contour correction unit performs a pitch contour generation process that includes a phrase component calculation process in which all phrases of an input sentence are processed by calculating a phrase component by statistical analysis according to said user-designated utterance speed or making said phrase component zero and a process in which all words in said input sentence are processed by calculating an accent component by statistical analysis according to said user-designated utterance speed and either correcting said accent component according to said user designated intonation level or making said accent component zero.
  - 7. The method according to claim 3, wherein said threshold is a predetermined maximum utterance speed.

6. A method of controlling high-speed reading in a text-to-speech conversion system including a text analysis module for generating a phoneme and prosody character string from an input text;
- a prosody generation module for generating a synthesis parameter of at least a voice segment, a phoneme duration, and a fundamental frequency for said phoneme and prosody character string;
  
  a voice segment dictionary in which voice segments as a source of voice are registered; and
  
  a speech generation module for generating a synthetic waveform by waveform superimposition while referring to said voice segment dictionary,said method comprising the step of providing said speech generation module with signal sound generation means for inserting a signal sound between sentences to indicate an end of a sentence when a user-designated utterance speed exceeds a threshold.

8. A method of controlling highspeed reading in a text-to-speech conversion system including a text analysis module for generating a phoneme and prosody character string from an input text;
- a prosody generation module for generating a synthesis parameter of at least a voice segment, a phoneme duration, and a fundamental frequency for the phoneme and prosody character string;
  
  a voice segment dictionary in which voice segments as a source of voice are registered; and
  
  a speech generation module for generating a synthetic waveform by waveform superimposition by referring to said voice segment dictionary,said method comprising the step of providing said prosody generation module with a phoneme duration determination unit for performing a process in which when a user-designated utterance speed exceeds a threshold, an utterance speed of at least a leading word in a sentence is returned to a normal utterance speed.
- View Dependent Claims (9, 10)
- - 9. The method according to claim 8, wherein said threshold is a predetermined maximum utterance speed.
  - 10. The method according to claim 8, wherein said phoneme duration determination unit performs a process in which when a word under process is a leading word in a sentence and said user-designated utterance speed exceeds said threshold, a phoneme duration is not corrected and, when said word under process is not a leading word of a sentence or said user-designated utterance speed does not exceed said threshold, a first process by which a phoneme duration correction coefficient is changed according to said user-designated utterance speed and a second process in which all syllables of said word are processed by correcting a length of a vowel or vowels of said word, and carrying out said first and second processes for all words contained in the sentence.

11. A method of controlling high-speed reading in a text-to-speech conversion system, comprising:
- inputting a text into the text-to-speech conversion system;
  
  generating a phoneme and prosody character string of the text with a text analysis module;
  
  creating a duration rule table containing a first phoneme duration obtained empirically;
  
  creating a duration prediction table containing a second phoneme duration obtained through statistical analysis;
  
  designating an utterance speed;
  
  determining a threshold value;
  
  comparing the utterance speed with the threshold value;
  
  selecting one of the duration rule table and the duration prediction table according to the utterance speed;
  
  determining a third phoneme duration with a phoneme duration determination unit according to the one of the duration rule table and the duration prediction table;
  
  generating a synthesis parameter of at least a voice segment, the third phoneme duration, and a fundamental frequency of the phoneme and prosody character string with a prosody generation module; and
  
  generating a synthetic waveform through waveform superimposition with a speech generation module according to the synthesis parameter and a voice segment dictionary containing a voice segment as a basic source of voice.
- View Dependent Claims (12, 13)
- - 12. The method according to claim 11, in the step of selecting the one of the duration rule table and the duration prediction table according to the utterance speed, said duration rule table is selected when the utterance speed exceeds the threshold value, and said duration prediction table is selected when the utterance speed does not exceed the threshold value.
  - 13. The method according to claim 11, in the step of determining the threshold value, said threshold value is determined to be a predetermined maximum utterance speed.

14. A method of controlling high-speed reading in a text-to-speech conversion system, comprising:
- inputting a text into the text-to-speech conversion system;
  
  generating a phoneme and prosody character string of the text with a text analysis module;
  
  creating a rule table containing first data of accent and phrase components obtained empirically;
  
  creating a prediction table containing second data of accent and phrase components obtained through statistical analysis;
  
  designating an utterance speed;
  
  determining a threshold value;
  
  comparing the utterance speed with the threshold value;
  
  selecting one of the rule table and the prediction table according to the utterance speed;
  
  determining a pitch contour with a pitch contour determination unit according to the one of the rule table and the prediction table;
  
  generating a synthesis parameter of at least a voice segment, a phoneme duration, and a fundamental frequency of the phoneme and prosody character string with a prosody generation module; and
  
  generating a synthetic waveform through waveform superimposition with a speech generation module according to the synthesis parameter and a voice segment dictionary containing a voice segment as a basic source of voice.
- View Dependent Claims (15, 16)
- - 15. The method according to claim 14, in the step of selecting the one of the rule table and the prediction table according to the utterance speed, said rule table is selected when the utterance speed exceeds the threshold value, and said prediction table is selected when the utterance speed does not exceed the threshold value.
  - 16. The method according to claim 14, in the step of determining the threshold value, said threshold value is determined to be a predetermined maximum utterance speed.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
LAPIS Semiconductor Co., Ltd. (ROHM Co., Ltd.)
Original Assignee
OKI Electric Industry Company Limited
Inventors
Chihara, Keiichi
Primary Examiner(s)
Hudspeth; David R.
Assistant Examiner(s)
ALBERTALLI, BRIAN LOUIS

Application Number

US10/058,104
Publication Number

US 20030004723A1
Time in Patent Office

1,981 Days
Field of Search

704/258, 704/267, 704/266
US Class Current

704/267
CPC Class Codes

G10L 13/04 Details of speech synthesis...

G10L 13/08 Text analysis or generation...

Method of controlling high-speed reading in a text-to-speech conversion system

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

29 Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Method of controlling high-speed reading in a text-to-speech conversion system

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

29 Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links