Apparatus and method of speech coding and decoding using multiple frames
First Claim
1. An Analysis by Synthesis method for determining the spectral envelope information in speech coding systems based. on synthesizing a synthetic digital speech signal from a data structure produced by dividing an initial speech signal into a plurality of frames, determining a pitch frequency, determining voicing information, representing whether each of a plurality of frequency bands of each frame should be synthesized as voiced or unvoiced frequency bands, and processing the frames to determine spectral envelope information representative of the magnitudes of a spectrum in the frequency bands, wherein the method of determining the spectral envelope information comprises the steps of:
- a) forming a model set of the spectral magnitudes by assigning fixed values;
b) synthesizing a model speech signal for the model set of the spectral magnitudes using both pitch frequencies and a set of voicing decisions determined for previous and current frames;
c) calculating a spectrum of the model speech signal;
d) approximating a spectrum of the initial speech signal by the spectrum of the model speech signal; and
e) encoding coefficients obtained from the approximated spectrum.
1 Assignment
0 Petitions
Accused Products
Abstract
An apparatus and method for speech compression includes dividing the speech spectrum into a plurality of frames, assigning frame classifications to the plurality of frames, and determining the speech modeling parameters based on the assigned frame classification. The voiced part of the speech spectrum and the unvoiced part of the speech spectrum are synthesized separately using an Analysis by Synthesis allowing a correct correspondence between voiced and unvoiced parts of the reconstructed signal. Particularly, a frequency response of a special simulated signal based on the previous and current frames is used as an approximating function. The simulated signal is synthesized at the encoder side in the way it will be generated at the decoder side. Also, a better of two encoding methods is selected to encode the spectral magnitudes. A wavelet encoder and an inter-frame predictive encoder illustrate the invention'"'"'s efficient, yet accurate reconstruction of synthesized digital speech.
-
Citations
31 Claims
-
1. An Analysis by Synthesis method for determining the spectral envelope information in speech coding systems based. on synthesizing a synthetic digital speech signal from a data structure produced by dividing an initial speech signal into a plurality of frames, determining a pitch frequency, determining voicing information, representing whether each of a plurality of frequency bands of each frame should be synthesized as voiced or unvoiced frequency bands, and processing the frames to determine spectral envelope information representative of the magnitudes of a spectrum in the frequency bands, wherein the method of determining the spectral envelope information comprises the steps of:
-
a) forming a model set of the spectral magnitudes by assigning fixed values;
b) synthesizing a model speech signal for the model set of the spectral magnitudes using both pitch frequencies and a set of voicing decisions determined for previous and current frames;
c) calculating a spectrum of the model speech signal;
d) approximating a spectrum of the initial speech signal by the spectrum of the model speech signal; and
e) encoding coefficients obtained from the approximated spectrum. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A hybrid method for spectral magnitudes encoding of each speech frame, comprising the steps of:
-
a) reducing a number of spectral magnitudes;
b) using different types of encoding schemes for simultaneously encoding the spectral magnitudes;
c) evaluating the encoding schemes; and
d) selecting from the evaluated encoding schemes the best encoding scheme for spectral magnitudes encoding as a base scheme. - View Dependent Claims (8, 9)
-
-
10. A method for synthesizing a synthetic digital speech signal from a data structure produced by dividing an initial speech signal into a plurality of frames, determining a pitch frequency, determining voicing information, representing whether each of a plurality of frequency bands of each frame should be synthesized as voiced or unvoiced frequency bands, and processing the frames to determine spectral envelope information representative of the magnitudes of a spectrum in the frequency bands, wherein the method for synthesizing the synthetic digital speech signal comprises the steps of:
-
a) building a frequency correspondence between bands of current and previous frames;
b) synthesizing speech components for the voiced frequency bands for couples of harmonics with the closest frequencies in the current and previous frames utilizing the built bands'"'"' frequency correspondence and lacing the coupled harmonics, wherein all uncoupled harmonics of the previous frame are smoothly decreased down to zero amplitude and wherein all uncoupled harmonics of the current frame are smoothly increased up to their own amplitudes;
c) synthesizing speech components for the unvoiced frequency bands; and
d) synthesizing the synthetic digital speech signal by combining the synthesized speech components for the voiced and the unvoiced frequency bands. - View Dependent Claims (11)
-
-
12. A system for speech signal coding and decoding, comprising a speech signal coder and a speech signal decoder, wherein the speech signal coder comprises:
-
a processor dividing an input digital speech signal into a plurality of frames to be analyzed in time and frequency domains;
an orthogonal transforming unit transforming each frame to provide spectral data on the frequency axis;
a pitch determination unit determinating a pitch frequency for each frame;
a voiced/unvoiced discrimination unit generating group voiced/unvoiced decisions utilizing the determined pitch frequencies;
a spectral magnitudes determination unit estimating spectral magnitudes by utilizing an Analysis by Synthesis method; and
a parameter encoding unit encoding the determined pitch frequency, the estimated spectral magnitude and the voiced/unvoiced decisions for each of the plurality of frames, and combining encoded data into a plurality of bits; and
wherein the speech signal decoder comprises;
a parameters decoding unit decoding the plurality of bits to provide the pitch frequency, spectral magnitudes and voiced/unvoiced decisions for each of the plurality of frames;
a bands'"'"' frequency correspondence map building unit building a bands'"'"' frequency correspondence map between bands of current and previous frames; and
a signal synthesizing unit synthesizing a speech signal from the pitch frequency, spectral magnitudes and voiced/unvoiced decision, and utilizing the bands'"'"' frequency correspondence map. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
a frame classification unit classifying and assigning a frame classification to each frame in the time domain by range and character of varying signal value along the frame and by characters of a signal oscillation in first and second parts of the frame; and
wherein the voiced/unvoiced discrimination unit generates group voiced/unvoiced decisions based upon the assigned frame classification.
-
-
14. A system of claim 13, wherein the voiced/unvoiced discrimination unit utilizes an adaptive threshold depending on the assigned frame classification.
-
15. A system of claim 13, wherein the pitch determination unit comprises:
-
a pitch candidates set determination unit determining a set of pitch candidates based upon an analysis of normalized auto-correlation function using either a direct or an inverse order depending on the assigned frame classification;
a best candidate selection unit estimating the set of pitch candidates in the frequency domain and selecting the best candidate from the set of pitch candidates; and
a best candidate refinement unit refining the selected best candidate in the frequency domain.
-
-
16. A system of claim 15, wherein the best candidate selection unit estimates the set of pitch candidates by a window function response scaled to obtain a predetermined sharpness of the window function in each band and to provide a final pitch candidate selection.
-
17. A system of claim 16, wherein the window function response is scaled for pitch frequencies lower than a predetermined frequency Fscale.
-
18. A system of claim 17, wherein the window function response is scaled by a procedure of proportional sharpening.
-
19. A system of claim 18, wherein the procedure of proportional sharpening is carried out by a linear interpolation.
-
20. A system of claim 19, wherein the window function responses scaled for different pitch frequencies are used as a look-up table.
-
21. A system of claim 12, wherein the parameter encoding unit further comprises:
-
a scalar quantization unit quantizing a value of the pitch frequency;
a spectral magnitudes wavelet reduction unit reducing a dimension of a spectral magnitude vector;
a spectral magnitudes hybrid encoding unit encoding the reduced a spectral magnitudes vector by a wavelet technique; and
a multiplexer unit combining the encoded data into a plurality of bits.
-
-
22. A system of claim 21, wherein the spectral magnitudes hybrid encoding unit comprises:
-
a wavelet encoder unit encoding the reduced spectral magnitudes vector;
an inter-frame prediction encoder unit encoding the reduced spectral magnitudes vector; and
a comparator unit comparing the effectiveness of the wavelet encoder unit and the effectiveness of the inter-frame prediction encoder unit to select a better encoder unit, and outputting a decision bit and data corresponding to the selected better encoder unit to the multiplexer unit.
-
-
23. A system of claim 12, wherein the signal synthesizing unit comprises:
-
a voice synthesizing unit synthesizing speech components for voiced frequency bands for couples of harmonics with the closest frequencies in the current and previous frames utilizing the built bands'"'"' frequency correspondence and lacing the coupled harmonics, wherein all uncoupled harmonics of the previous frame are smoothly decreased down to zero amplitude and wherein all uncoupled harmonics of the current frame are smoothly increased up to their own amplitudes;
an unvoiced synthesis unit synthesizing speech components for unvoiced frequency bands; and
an adder synthesizing the speech signal by summing the synthesized speech components for the voiced and the unvoiced frequency bands.
-
-
24. A system of claim 12, wherein the spectral magnitudes determination unit comprises:
-
a bands'"'"' frequency correspondence map building unit building a frequency correspondence between bands of current and previous frames;
a voiced synthesis unit synthesizing a model voiced signal for a model set of the spectral magnitudes based upon the built bands'"'"' frequency correspondence, the pitch frequency and the set of voicing decisions for the previous and current frames;
a first windowing unit processing the model voiced signal;
an orthogonal transforming unit transforming a model voiced signal windowed by the first windowing unit into a frequency domain;
a voice magnitude evaluation unit evaluating voiced magnitudes of the transformed model voiced signal by a Least Square Method;
a synchronized noise generator producing a model white noise signal with a unit amplitude range;
a second windowing unit processing the model white noise signal;
an orthogonal transforming unit transforming the model white noise signal windowed by the second windowing unit to a frequency domain; and
an unvoiced magnitudes evaluation unit evaluating unvoiced magnitudes of the transformed model white noise signal by a Least Square Method.
-
-
25. A system of claim 24, wherein the voiced synthesis unit forms the model voiced signal for the model set of the spectral magnitudes by assigning fixed etalon values equal to 1 for voiced bands and 0 otherwise.
-
26. A system of claim 12, wherein the voiced/unvoiced discrimination unit generates the group voiced/unvoiced decisions utilizing a window function response scaled to obtain a predetermined sharpness of the window function in each band and to provide a final voiced/unvoiced decisions generation.
-
27. A system of claim 26, wherein the window function response is scaled for pitch frequencies lower than a predetermined frequency Fscale.
-
28. A system of claim 27, wherein the window function response is scaled by a procedure of proportional sharpening.
-
29. A system of claim 28, wherein the procedure of proportional sharpening is carried out by a linear interpolation.
-
30. A system of claim 29, wherein the window function responses scaled for different pitch frequencies are used as a look-up table.
-
31. A system of claim 30, wherein the voiced/unvoiced discrimination unit tunes a position of said scaled responses relative to the location of a frequency band peak.
Specification