Speech Synthesizer
First Claim
1. A method of synthesizing a set of digital speech samples corresponding to a selected voicing state from speech model parameters, the method comprising the steps of:
- dividing the speech model parameters into frames, wherein a frame of speech model parameters includes pitch information, voicing information determining the voicing state in one or more frequency regions, and spectral information;
computing a first digital filter using a first frame of speech model parameters, wherein the frequency response of the first digital filter corresponds to the spectral information in frequency regions where the voicing state equals the selected voicing state;
computing a second digital filter using a second frame of speech model parameters, wherein the frequency response of the second digital filter corresponds to the spectral information in frequency regions where the voicing state equals the selected voicing state;
determining a set of pulse locations;
producing a set of first signal samples from the first digital filter and the pulse locations;
producing a set of second signal samples from the second digital filter and the pulse locations;
combining the first signal samples with the second signal samples to produce a set of digital speech samples corresponding to the selected voicing state;
using the set of digital speech samples corresponding to the selected voicing state to produce speech samples of a digital speech signal;
providing the speech samples of the digital speech signal to a digital-to-analog converter that converts the speech samples of the digital speech signal to an analog signal; and
providing the analog signal to a speaker that converts the analog signal into an acoustic signal suitable for human listening.
0 Assignments
0 Petitions
Accused Products
Abstract
Synthesizing a set of digital speech samples corresponding to a selected voicing state includes dividing speech model parameters into frames, with a frame of speech model parameters including pitch information, voicing information determining the voicing state in one or more frequency regions, and spectral information. First and second digital filters are computed using, respectively, first and second frames of speech model parameters, with the frequency responses of the digital filters corresponding to the spectral information in frequency regions for which the voicing state equals the selected voicing state. A set of pulse locations are determined, and sets of first and second signal samples are produced using the pulse locations and, respectively, the first and second digital filters. Finally, the sets of first and second signal samples are combined to produce a set of digital speech samples corresponding to the selected voicing state.
-
Citations
85 Claims
-
1. A method of synthesizing a set of digital speech samples corresponding to a selected voicing state from speech model parameters, the method comprising the steps of:
-
dividing the speech model parameters into frames, wherein a frame of speech model parameters includes pitch information, voicing information determining the voicing state in one or more frequency regions, and spectral information; computing a first digital filter using a first frame of speech model parameters, wherein the frequency response of the first digital filter corresponds to the spectral information in frequency regions where the voicing state equals the selected voicing state; computing a second digital filter using a second frame of speech model parameters, wherein the frequency response of the second digital filter corresponds to the spectral information in frequency regions where the voicing state equals the selected voicing state; determining a set of pulse locations; producing a set of first signal samples from the first digital filter and the pulse locations; producing a set of second signal samples from the second digital filter and the pulse locations; combining the first signal samples with the second signal samples to produce a set of digital speech samples corresponding to the selected voicing state; using the set of digital speech samples corresponding to the selected voicing state to produce speech samples of a digital speech signal; providing the speech samples of the digital speech signal to a digital-to-analog converter that converts the speech samples of the digital speech signal to an analog signal; and providing the analog signal to a speaker that converts the analog signal into an acoustic signal suitable for human listening. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37)
-
-
38. A method of decoding digital speech samples corresponding to a selected voicing state from a stream of bits, the method comprising:
-
dividing the stream of bits into a sequence of frames, wherein each frame contains one or more subframes; decoding speech model parameters from the stream of bits for each subframe in a frame, the decoded speech model parameters including at least pitch information, voicing state information and spectral information; computing a first impulse response from the decoded speech model parameters for a subframe and computing a second impulse response from the decoded speech model parameters for a previous subframe, wherein both the first impulse response and the second impulse response correspond to the selected voicing state; computing a set of pulse locations for the subframe; producing a set of first signal samples from the first impulse response and the pulse locations; producing a set of second signal samples from the second impulse response and the pulse locations; combining the first signal samples with the second signal samples to produce the digital speech samples for the subframe corresponding to the selected voicing state; using the digital speech samples for the subframe corresponding to the selected voicing state to produce speech samples of a digital speech signal; providing the speech samples of the digital speech signal to a digital-to-analog converter that converts the speech samples of the digital speech signal to an analog signal; and providing the analog signal to a speaker that converts the analog signal into an acoustic signal suitable for human listening. - View Dependent Claims (39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77)
-
-
78. A device comprising:
-
a microphone that produces an analog speech signal in response to detected speech; an analog-to-digital converter connected to receive the analog speech signal from the microphone and produce a digital speech signal from the analog speech signal; and a speech encoder connected to receive the digital speech signal, wherein the speech encoder; produces frames of speech model parameters from the digital speech signal, wherein a frame of speech model parameters includes pitch information, voicing information determining the voicing state in one or more frequency regions, and spectral information, computes a first digital filter using a first frame of speech model parameters, wherein the frequency response of the first digital filter corresponds to the spectral information in frequency regions where the voicing state equals the selected voicing state, computes a second digital filter using a second frame of speech model parameters, wherein the frequency response of the second digital filter corresponds to the spectral information in frequency regions where the voicing state equals the selected voicing state, determines a set of pulse locations, produces a set of first signal samples from the first digital filter and the pulse locations, produces a set of second signal samples from the second digital filter and the pulse locations, and combines the first signal samples with the second signal samples to produce a set of digital speech samples corresponding to the selected voicing state. - View Dependent Claims (79, 80, 81)
-
-
82. A mobile communication device comprising:
-
a speech decoder that processes frames of speech model parameters from a digital speech signal, wherein a frame of speech model parameters includes pitch information, voicing information determining the voicing state in one or more frequency regions, and spectral information, and wherein the speech decoder; computes a first digital filter using a first frame of speech model parameters, wherein the frequency response of the first digital filter corresponds to the spectral information in frequency regions where the voicing state equals the selected voicing state, computes a second digital filter using a second frame of speech model parameters, wherein the frequency response of the second digital filter corresponds to the spectral information in frequency regions where the voicing state equals the selected voicing state, determines a set of pulse locations, produces a set of first signal samples from the first digital filter and the pulse locations, produces a set of second signal samples from the second digital filter and the pulse locations, combines the first signal samples with the second signal samples to produce a set of digital speech samples corresponding to the selected voicing state, and uses the digital speech samples for the subframe corresponding to the selected voicing state to produce speech samples of a digital speech signal; a digital-to-analog converter that receives the speech samples of the digital speech signal and converts the speech samples of the digital speech signal to an analog signal; and a speaker that receives the analog signal and converts the analog signal into an acoustic signal suitable for human listening. - View Dependent Claims (83, 84, 85)
-
Specification