Method and apparatus for text-to-voice audio output with accent control and improved phrase control

US 5,758,320 A
Filed: 06/12/1995
Issued: 05/26/1998
Est. Priority Date: 06/15/1994
Status: Expired due to Fees

First Claim

Patent Images

1. An audio output unit for expressing a temporal change pattern of a fundamental frequency of an output voice using a sum of a phrase component corresponding to an intonation of the output voice and an accent component corresponding to a basic accent of the output voice, wherein the temporal change pattern of the fundamental frequency includes linguistic information such as basic accent, emphasis, intonation, and syntax, the phrase component is approximated by a response characteristic of a first secondary linear system to an impulsive phrase command, the accent component is approximated by a response characteristic of a second secondary linear system to a step accent command, and the temporal change pattern of the fundamental frequency is expressed on a logarithmic scale, the audio output unit comprising:

a storage section for storing analyzed information pertaining to an input character list, the analyzed information including a word, a boundary between articulations, and a basic accent;

a voice synthesis rule section including a phrase component characteristic control section for controlling a reduction or damping characteristic of a phrase component of a fundamental frequency in order to control a response characteristic of a first secondary linear system to a phrase command used in calculating the phrase component, the reduction or damping characteristic being any of an underdamped characteristic, a critically-damped characteristic, and an overdamped characteristic, and for generating a temporal change pattern of the fundamental frequency in accordance with the calculated phrase component; and

a voice synthesizing section for generating a composite tone using synthesized waveform data generated in accordance with predetermined phonemic rules from the voice synthesis rule section and the temporal change pattern of the fundamental frequency from the voice synthesis rule section based on the analyzed information from the storage section.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A text-to-voice audio output unit includes a storage section for storing analyzed information pertaining to words, boundaries between articulations, and accents obtained by analyzing an input character list, a voice synthesis rule section for changing a reduction or damping characteristic of a phrase component of a fundamental frequency of an output voice, and a voice synthesizing section for generating a composite tone based on the analyzed information from the storage section. The reduction or damping characteristic, calculated for each phrase component, is overdamped, critically damped, or underdamped and is based on speech rate, syntactic information, number of articulations, and positional information. When a prosodic phrase is short, the reduction or damping characteristic causes a decrease in the fundamental frequency for a meaningfully-delimited portion, and when a prosodic phrase is long, the reduction or damping characteristic is controlled over the entire prosodic phrase.

64 Citations

View as Search Results

4 Claims

1. An audio output unit for expressing a temporal change pattern of a fundamental frequency of an output voice using a sum of a phrase component corresponding to an intonation of the output voice and an accent component corresponding to a basic accent of the output voice, wherein the temporal change pattern of the fundamental frequency includes linguistic information such as basic accent, emphasis, intonation, and syntax, the phrase component is approximated by a response characteristic of a first secondary linear system to an impulsive phrase command, the accent component is approximated by a response characteristic of a second secondary linear system to a step accent command, and the temporal change pattern of the fundamental frequency is expressed on a logarithmic scale, the audio output unit comprising:
- a storage section for storing analyzed information pertaining to an input character list, the analyzed information including a word, a boundary between articulations, and a basic accent;
  
  a voice synthesis rule section including a phrase component characteristic control section for controlling a reduction or damping characteristic of a phrase component of a fundamental frequency in order to control a response characteristic of a first secondary linear system to a phrase command used in calculating the phrase component, the reduction or damping characteristic being any of an underdamped characteristic, a critically-damped characteristic, and an overdamped characteristic, and for generating a temporal change pattern of the fundamental frequency in accordance with the calculated phrase component; and
  
  a voice synthesizing section for generating a composite tone using synthesized waveform data generated in accordance with predetermined phonemic rules from the voice synthesis rule section and the temporal change pattern of the fundamental frequency from the voice synthesis rule section based on the analyzed information from the storage section.
- View Dependent Claims (2)
- - 2. The audio output unit according to claim 1, wherein the voice synthesis rule section further includes:
    - a speech rate extracting section for detecting a speech rate of the output voice;
      
      a syntactic information extracting section for detecting syntactic information relating to the output voice;
      
      an articulation number extracting section for detecting a number of articulations, wherein the number of articulations is used in calculating the phrase component; and
      
      a positional information extracting section for detecting positional information of a phrase command in an output sentence, wherein the phrase component is calculated in accordance with the speech rate, the syntactic information, the number of articulations, and the positional information corresponding to the phrase command.

3. A method for outputting a composite tone by expressing a temporal change pattern of a fundamental frequency of an output voice using a sum of a phrase component corresponding to an intonation of the output voice and an accent component corresponding to a basic accent of the output voice, wherein the temporal change pattern of the fundamental frequency includes linguistic information such as basic accent, emphasis, intonation, and syntax, the phrase component is approximated by a response characteristic of a first secondary linear system to an impulsive phrase command, the accent component is approximated by a response characteristic of a second secondary linear system to a step accent command, and the temporal change pattern of the fundamental frequency is expressed on a logarithmic scale, the method comprising the steps of:
- storing analyzed information including a word, a boundary between articulations, and a basic accent, wherein the analyzed information is obtained by analyzing an input character list;
  
  changing a reduction or damping characteristic of a phrase component of a fundamental frequency in order to control a response characteristic of a first secondary linear system to a phrase command used in calculating the phrase component, the reduction or damping characteristic being any of an underdamped characteristic, a critically-damped characteristic, and an overdamped characteristic;
  
  generating a temporal change pattern of the fundamental frequency in accordance with the calculated phrase components; and
  
  generating a composite tone using synthesized waveform data generated in accordance with predetermined phonemic rules and the temporal change pattern of the fundamental frequency based on the analyzed information.
- View Dependent Claims (4)
- - 4. The method for outputting a composite tone according to claim 3, wherein the step of generating a temporal change pattern of the fundamental frequency comprises:
    - detecting a speech rate of the output voice;
      
      detecting syntactic information related to the output voice;
      
      detecting a number of articulations, wherein the number of articulations is used in calculating the phrase component;
      
      detecting positional informational for a phrase command in an output sentence;
      
      controlling the reduction or damping characteristic of the phrase component in accordance with the speech rate, the syntactic information, the number of articulations, and the positional information for the phrase command, the reduction or damping characteristic being any of an underdamped characteristic, a critically-damped characteristic, and an overdamped characteristic; and
      
      calculating the phrase component.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sony Corporation (Sony Group Corp.)
Original Assignee
Sony Corporation (Sony Group Corp.)
Inventors
Asano, Yasuharu
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
Opsasnick, Michael N.

Application Number

US08/489,316
Time in Patent Office

1,079 Days
Field of Search

395/2.09, 395/2.1, 395/2.14, 395/2.16, 395/2.67, 395/2.69, 395/2.76, 395/2.77, 704/200, 704/201, 704/205, 704/207, 704/258, 704/260, 704/267, 704/268
US Class Current

704/258
CPC Class Codes

G10L 13/08 Text analysis or generation...

Method and apparatus for text-to-voice audio output with accent control and improved phrase control

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

64 Citations

4 Claims

Specification

Use Cases

Quick Links

Others

Method and apparatus for text-to-voice audio output with accent control and improved phrase control

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

64 Citations

4 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others