Text-to-speech synthesizer having formant-rule and speech-parameter synthesis modes

US 5,204,905 A
Filed: 05/29/1990
Issued: 04/20/1993
Est. Priority Date: 05/29/1989
Status: Expired due to Fees

First Claim

Patent Images

1. A text-to-speech synthesizer comprising:

analyzer means for decomposing a sequence of input characters into phoneme components and classifying the decomposed phoneme components as a first group of phoneme components if each phoneme component is to be synthesized by a speech parameter and classifying said phoneme components as a second group of phoneme components if each phoneme component is to be synthesized by a formant rule;

first memory means for storing speech parameters derived from natural human speech, said speech parameters corresponding to the phoneme components of said first group and being retrievable from said first memory means in response to each of the phoneme components of the first group;

second memory means for storing formant rules for generating formant transition patterns, said formant rules corresponding to the phoneme components of said second group and being retrievable from said second memory means in response to each of the phoneme components of the second group;

means for retrieving a speech parameter from said first memory means in response to one of the phoneme components of the first group;

means for retrieving a formant rule from said second memory means in response to one of said phoneme components of the second group and deriving a formant transition pattern from the retrieved formant rule;

parameter converter means for converting a formant of said derived formant transition pattern into a corresponding speech parameter; and

speech synthesizer means for synthesizing a human speech utterance from the speech parameter retrieved from said first memory means and synthesizing a human speech utterance from the speech parameter converted by said parameter converter means,wherein said speech parameters stored in said first memory means are represented by auto-regressive (AR) parameters, and said formant of said derived formant transition patterns are represented by frequency and bandwidth values, wherein said parameter converter means comprises;

means for converting the frequency value of said formant into a value equal to C=cos(2π

F/f_s), where F is said frequency value and f_s represents a sampling frequency, and converting the bandwidth value of said formant into a value equal to R=exp(-π

B/f_s), where B is the bandwidth value;

means for generating a first signal representative of a value 2×

C×

R and a second signal representative of a value R² ;

unit impulse generator for generating a unit impulse; and

a series of second-order transversal filters connected in series from said unit impulse generator to said speech synthesizer means, each of said second-order transversal filters including a tapped delay line, first and second tap-weight multipliers connected respectively to successive taps of said tapped delay line, and an adder for summing the outputs of said multipliers with said unit impulse, said first and second multipliers multiplying signals at said successive taps with said first and second signals, respectively.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A text-to-speech synthesizer comprises an analyzer that decomposes a sequence of input characters into phoneme components and classifies them as a first group of phoneme components or a second group if they are to be synthesized by a speech parameter or by a formant rule, respectively. Speech parameters derived from natural human speech are stored in first memory locations corresponding to the phoneme components of the first group and the stored speech parameters are recalled from the first memory in response to each of the phoneme components of the first group. Formant rules capable of generating formant transition patterns are stored in second memory locations corresponding to the phoneme components of the second group, the formant rules being recalled from the second memory in response to each of the phoneme components of the second group. Formant transition patterns are derived from the formant rule recalled from the second memory, and formants of the derived transition patterns are converted into corresponding speech parameters. Spoken words are digitally synthesized from the speech parameters recalled from the first memory as well as from those supplied from the converted speech parameters.

Citations

6 Claims

1. A text-to-speech synthesizer comprising:
- analyzer means for decomposing a sequence of input characters into phoneme components and classifying the decomposed phoneme components as a first group of phoneme components if each phoneme component is to be synthesized by a speech parameter and classifying said phoneme components as a second group of phoneme components if each phoneme component is to be synthesized by a formant rule;
  
  first memory means for storing speech parameters derived from natural human speech, said speech parameters corresponding to the phoneme components of said first group and being retrievable from said first memory means in response to each of the phoneme components of the first group;
  
  second memory means for storing formant rules for generating formant transition patterns, said formant rules corresponding to the phoneme components of said second group and being retrievable from said second memory means in response to each of the phoneme components of the second group;
  
  means for retrieving a speech parameter from said first memory means in response to one of the phoneme components of the first group;
  
  means for retrieving a formant rule from said second memory means in response to one of said phoneme components of the second group and deriving a formant transition pattern from the retrieved formant rule;
  
  parameter converter means for converting a formant of said derived formant transition pattern into a corresponding speech parameter; and
  
  speech synthesizer means for synthesizing a human speech utterance from the speech parameter retrieved from said first memory means and synthesizing a human speech utterance from the speech parameter converted by said parameter converter means,wherein said speech parameters stored in said first memory means are represented by auto-regressive (AR) parameters, and said formant of said derived formant transition patterns are represented by frequency and bandwidth values, wherein said parameter converter means comprises;
  
  means for converting the frequency value of said formant into a value equal to C=cos(2π
  
  F/f_s), where F is said frequency value and f_s represents a sampling frequency, and converting the bandwidth value of said formant into a value equal to R=exp(-π
  
  B/f_s), where B is the bandwidth value;
  
  means for generating a first signal representative of a value 2×
  
  C×
  
  R and a second signal representative of a value R² ;
  
  unit impulse generator for generating a unit impulse; and
  
  a series of second-order transversal filters connected in series from said unit impulse generator to said speech synthesizer means, each of said second-order transversal filters including a tapped delay line, first and second tap-weight multipliers connected respectively to successive taps of said tapped delay line, and an adder for summing the outputs of said multipliers with said unit impulse, said first and second multipliers multiplying signals at said successive taps with said first and second signals, respectively.
- View Dependent Claims (2, 3)
- - 2. A text-to-speech synthesizer as claimed in claim 1, wherein said analyzer means comprises a table for mapping relationships between a plurality of phoneme component strings and corresponding indications classifying said phoneme component strings as falling into one of said first and second groups, and means for detecting a match between a decomposed phoneme component and a phoneme component in said phoneme component strings and classifying the decomposed phoneme component as one of said first and second groups according to the corresponding indication if said match is detected.
  - 3. A text-to-speech synthesizer as claimed in claim 1, wherein said speech synthesizer means comprises:
    - source wave generator means for generating a source wave;
      
      input and output adders connected in series from said source wave generator means to an output terminal of said text-to-speech synthesizer;
      
      a tapped delay line connected to the output of said input adder;
      
      a plurality of first tap-weight multipliers having input terminals respectively connected to successive taps of said tapped-delay line and output terminals connected to input terminals of said input adder, said first tap-weight multipliers respectively multiplying signals at said successive taps with signals supplied from said first memory means and said parameter converter means; and
      
      a plurality of second tap-weight multipliers having input terminals respectively connected to successive taps of said tapped-delay line and output terminals connected to input terminals of said output adder, said second tap-weight multipliers respectively multiplying signals at said successive taps with signals supplied from said first memory means and said parameter converter means.

4. A text-to-speech synthesizer comprising:
- analyzer means for decomposing a sequence of input characters into phoneme components and classifying the decomposed phoneme components as a first group of phoneme components if each phoneme component is to be synthesized by a speech parameter and classifying said phoneme components as a second group of phoneme components if each phoneme component is to be synthesized by a formant rule;
  
  first memory means for storing speech parameters derived from natural human speech, said speech parameters corresponding to the phoneme components of said first group and being retrievable from said first memory means in response to each of the phoneme components of the first group;
  
  second memory means for storing formant rules for generating formant transition patterns, said formant rules corresponding to the phoneme components of said second group and being retrievable from said second memory means in response to each of the phoneme components of the second group;
  
  means for retrieving a speech parameter from said first memory means in response to one of the phoneme components of the first group;
  
  means for retrieving a formant rule from said second memory means in response to one of said phoneme components of the second group and deriving a formant transition pattern from the retrieved formant rule;
  
  parameter converter means for converting a formant of said derived formant transition pattern into a corresponding speech parameter; and
  
  speech synthesizer means for synthesizing a human speech utterance from the speech parameter retrieved from said first memory means and synthesizing a human speech utterance from the speech parameter converted by said parameter converter means,wherein said speech parameters in said first memory means are represented by auto-regressive (AR) parameters and auto-negressive moving average (ARMA) parameters, and said formant rules in said second memory means being further capable of generating antiformant transition patterns, each of said formants and said antiformants being represented by frequency and bandwidth values, wherein said parameter converter means comprises;
  
  means for converting the frequency value of said formant into a value equal to C=cos(2π
  
  F/f_s), where F is said frequency value and f_s represents a sampling frequency, and converting the bandwidth value of said formant into a value equal to R=exp(-π
  
  B/f_s), where B is the bandwidth value;
  
  means for generating a first signal representative of a value 2×
  
  C×
  
  R and a second signal representative of a value R² ;
  
  unit impulse generator means for generating a unit impulse; and
  
  a series of second-order transversal filters connected in series from said unit impulse generator to said speech synthesizer means, each of said second-order transversal filters including a tapped delay line, first and second tap-weight multipliers connected respectively to successive taps of said tapped delay line, and an adder for summing the outputs of said multipliers with said unit impulse, said first and second multipliers multiplying signals at said successive taps with said first and second signals, respectively.
- View Dependent Claims (5, 6)
- - 5. A text-to-speech synthesizer as claimed in claim 4, wherein said analyzer means comprises a table for mapping relationships between a plurality of phoneme component strings and corresponding indications classifying said phoneme component strings as falling into one of said first and second groups, and means for detecting a match between a decomposed phoneme component and a phoneme component in said phoneme component strings and classifying the decomposed phoneme component as one of said first and second groups according to the corresponding indication if said match is detected.
  - 6. A text-to-speech synthesizer as claimed in claim 4, wherein said speech synthesizer means comprises:
    - source wave generator means for generating a source wave;
      
      input and output adders connected in series from said source wave generator means to an output terminal of said text-to-speech synthesizer;
      
      a tapped delay line connected to the output of said input adder;
      
      a plurality of first tap-weight multipliers having input terminals respectively connected to successive taps of said tapper-delay line and output terminals connected to input terminals of said input adder, said first tap-weight multipliers respectively multiplying signals at said successive taps with signals supplied from said first memory means and said parameter converter means; and
      
      a plurality of second tap-weight multipliers having input terminals respectively connected to successive taps of said tapped-delay line and output terminals connected to input terminals of said output adder, said second tap-weight multipliers respectively multiplying signals at said successive taps with signals supplied from said first memory means and said parameter converter means.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
NEC Corporation
Original Assignee
NEC Corporation
Inventors
Mitome, Yukio
Primary Examiner(s)
Shaw, Dale M.
Assistant Examiner(s)
Tung, Kee M.

Application Number

US07/529,421
Time in Patent Office

1,057 Days
Field of Search

381/51-53, 364/724.16, 364/724.17
US Class Current

704/260
CPC Class Codes

G10L 13/02 Methods for producing synth...

Text-to-speech synthesizer having formant-rule and speech-parameter synthesis modes

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

6 Claims

Specification

Solutions

Use Cases

Quick Links

Text-to-speech synthesizer having formant-rule and speech-parameter synthesis modes

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

6 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links