Method and apparatus for synthesizing speech without voicing or pitch information
First Claim
1. A speech synthesizer for generating reconstructed speech signals from external acoustic information sets, without using external specific voicing or pitch information, each said acoustic feature information set comprising a plurality of modification signals, said speech synthesizer comprising:
- means for generating a first and second excitation signal from an external acoustic information set, including a plurality of channel gain values, for each reconstructed speech signal using substantially common voicing or pitch information, said first excitation signal having an identifiable periodicity;
means for changing the periodicity of said first excitation signal from a predetermined initial first excitation signal period at a rate related to the length of said external acoustic feature information set; and
means for modifying an operating parameter of said first excitation signal in response to a first group of said modification signals, and for modifying an operating parameter of said second excitation signal in response to a second group of said modification signals, thereby producing corresponding first and second groups of modified outputs.
0 Assignments
0 Petitions
Accused Products
Abstract
A channel bank speech synthesizer for reconstructing speech from externally-generated acoustic feature information without using externally-generated voicing or pitch information is disclosed. An N-channel pitch-excited channel bank synthesizer (340) is provided having a first low-frequency group of channel gain values (1 to M) and a second high-frequency group of channel gain values (+1 to N). The first group controls a first group of amplitude modulators (950) excited by a periodic pitch pulse source (920), and the second group controls amplitude modulators excited by a noise source (930). Both groups of modulated excitation signals are applied to the bandpass filters (960) to reconstruct the speech channels, and then combined at the summation network (970) to form a reconstructed synthesized speech signal. Additionally, the pitch pulse source (920) varies the pitch pulse period such that the pitch pulse rate decreases over the length of the word.
-
Citations
43 Claims
-
1. A speech synthesizer for generating reconstructed speech signals from external acoustic information sets, without using external specific voicing or pitch information, each said acoustic feature information set comprising a plurality of modification signals, said speech synthesizer comprising:
-
means for generating a first and second excitation signal from an external acoustic information set, including a plurality of channel gain values, for each reconstructed speech signal using substantially common voicing or pitch information, said first excitation signal having an identifiable periodicity; means for changing the periodicity of said first excitation signal from a predetermined initial first excitation signal period at a rate related to the length of said external acoustic feature information set; and means for modifying an operating parameter of said first excitation signal in response to a first group of said modification signals, and for modifying an operating parameter of said second excitation signal in response to a second group of said modification signals, thereby producing corresponding first and second groups of modified outputs. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A channel band speech synthesizer for generating reconstructed speech words from external acoustic feature information sets without using external specific voicing information, each said acoustic feature information set comprising a plurality of channel gain values, each representative of the acoustic energy in a specified frequency bandwidth, each said acoustic feature information further comprising pitch information, said speech synthesizer comprising:
-
means for generating a first and second excitation signal for each reconstructed speech word using substantially common voicing information, said first excitation signal representative of periodic pulses of a rate determined by said pitch information, said second excitation signal representative of random noise; means for changing the periodicity of said first excitation signal of a reconstructed speech word from a predetermined first excitation signal period at a rate related to the length of an external acoustic information set; means for amplitude modulating said first excitation signal of a reconstructed speech word in response to a first group of said plurality of channel gain values, and for amplitude modulating said second excitation signal of said reconstructed speech word in response to a second group of said plurality of channel gain values, thereby producing corresponding first and second groups of channel outputs for said reconstructed speech word; means for filtering said first and second groups of channel outputs to produce a plurality of filtered channel outputs; and means for combining each of said plurality of filtered channel outputs to form said reconstructed speech word. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A channel bank speech synthesizer for generating reconstructed speech words from external acoustic feature information sets without using external specific pitch information, each said acoustic feature information set comprising a plurality of channel gain values, each representative of the acoustic energy in a specified frequency bandwidth, each of said acoustic feature information set further comprising voicing information, said speech synthesizer comprising:
-
means for generating at least one excitation signal for each reconstructed speech word in response to said voicing information using substantially common pitch information, said excitation signal representative of periodic pulses having a variable rate related to the length of an external acoustic information set for voiced sounds, said excitation signal representative of random noise for unvoiced sounds; means for amplitude modulating said excitation signal of a reconstructed speech word in response to a plurality of channel gain values, thereby producing a corresponding plurality of channel outputs for said reconstructed speech word; means for filtering said plurality of channel outputs to produce a plurality of filtered channel outputs; and means for combining each of said plurality of filtered channel outputs to form said reconstructed speech word. - View Dependent Claims (15, 16, 17, 18)
-
-
19. A channel band speech synthesizer for generating reconstructed speech words from external acoustic feature information sets without using external specific voicing or pitch information, each said acoustic feature information set comprising a plurality of channel gain values, each channel gain value representative of the acoustic energy in a specified frequency bandwidth, said speech synthesizer comprising:
-
means for generating a first and second excitation signal for reconstructed speech word using substantially common voicing or pitch information, said first excitation signal representative of periodic pulses of a variable rate related to the length of an acoustic information set, said second excitation signal representative of random noise; means for amplitude modulating said first excitation signal of a reconstructed speech word in response to a first group of said plurality of channel gain values, and for amplitude modulating said second excitation signal of said reconstructed speech word in response to a second group of said plurality of channel gain values, thereby producing corresponding first and second groups of channel outputs for said reconstructed speech word; means for bandpass filtering said first and second groups of channel outputs to produce a plurality of filtered channel outputs; and means for combining each of said plurality of filtered channel outputs to form said reconstructed speech word. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27)
-
-
28. A method of synthesizing speech signals from external acoustic feature information sets without using external specific voicing or pitch information, each said acoustic feature information set comprising a plurality of modification signals, said speech synthesis method comprising the steps of:
-
generating a first and second excitation signal from an external acoustic feature information set, including a plurality of channel gain values, for each synthesized speech signal using substantially common voicing or pitch information, said first excitation signal having an identifiable periodicity; changing the periodicity of said first excitation signal from a predetermined initial first excitation signal period at a rate related to the length of said external acoustic feature information set; modifying an operating parameter of said first excitation signal of a reconstructed speech word in response to a first group of said modification signals, and modifying an operating parameter of said second excitation signal of said reconstructed speech word in response to a second group of said modification signals, thereby producing corresponding first and second groups of modified outputs for said synthesized speech signal; filtering said first and second groups of modified outputs to produce a plurality of filtered outputs; and combining each of said plurality of filtered outputs to form said synthesized speech signal. - View Dependent Claims (29, 30, 31, 32, 33, 34)
-
-
35. A method of synthesizing speech word from external acoustic feature information sets without using external specific voicing or pitch information, each said acoustic feature information set comprising a plurality of channel gain values, each gain value representative of the acoustic energy in a specified frequency bandwidth, said speech synthesis method comprising the steps of:
-
generating a first and second excitation signal for each synthesized speech word using substantially common voicing or pitch information, said first excitation signal representative of periodic pulses of a variable rate related to the length of an external acoustic information set, said second excitation signal representative of random noise; amplitude modulating said first excitation signal of a synthesized speech word in response to a first group of said plurality of channel gain values, and amplitude modulating said second excitation signal of said synthesized speech word in response to a second group of said plurality of channel gain values, thereby producing corresponding first and second groups of channel outputs for said synthesized speech word; bandpass filtering said first and second groups of channel outputs to produce a plurality of filtered channel outputs; and combining each of said plurality of filtered channel outputs to form said synthesized speech word. - View Dependent Claims (36, 37, 38, 39, 40, 41, 42, 43)
-
Specification