Apparatus and method of speech coding and decoding using multiple frames

US 6,496,797 B1
Filed: 04/01/1999
Issued: 12/17/2002
Est. Priority Date: 04/01/1999
Status: Expired due to Term

First Claim

Patent Images

1. An Analysis by Synthesis method for determining the spectral envelope information in speech coding systems based. on synthesizing a synthetic digital speech signal from a data structure produced by dividing an initial speech signal into a plurality of frames, determining a pitch frequency, determining voicing information, representing whether each of a plurality of frequency bands of each frame should be synthesized as voiced or unvoiced frequency bands, and processing the frames to determine spectral envelope information representative of the magnitudes of a spectrum in the frequency bands, wherein the method of determining the spectral envelope information comprises the steps of:

a) forming a model set of the spectral magnitudes by assigning fixed values;

b) synthesizing a model speech signal for the model set of the spectral magnitudes using both pitch frequencies and a set of voicing decisions determined for previous and current frames;

c) calculating a spectrum of the model speech signal;

d) approximating a spectrum of the initial speech signal by the spectrum of the model speech signal; and

e) encoding coefficients obtained from the approximated spectrum.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An apparatus and method for speech compression includes dividing the speech spectrum into a plurality of frames, assigning frame classifications to the plurality of frames, and determining the speech modeling parameters based on the assigned frame classification. The voiced part of the speech spectrum and the unvoiced part of the speech spectrum are synthesized separately using an Analysis by Synthesis allowing a correct correspondence between voiced and unvoiced parts of the reconstructed signal. Particularly, a frequency response of a special simulated signal based on the previous and current frames is used as an approximating function. The simulated signal is synthesized at the encoder side in the way it will be generated at the decoder side. Also, a better of two encoding methods is selected to encode the spectral magnitudes. A wavelet encoder and an inter-frame predictive encoder illustrate the invention'"'"'s efficient, yet accurate reconstruction of synthesized digital speech.

Citations

31 Claims

1. An Analysis by Synthesis method for determining the spectral envelope information in speech coding systems based. on synthesizing a synthetic digital speech signal from a data structure produced by dividing an initial speech signal into a plurality of frames, determining a pitch frequency, determining voicing information, representing whether each of a plurality of frequency bands of each frame should be synthesized as voiced or unvoiced frequency bands, and processing the frames to determine spectral envelope information representative of the magnitudes of a spectrum in the frequency bands, wherein the method of determining the spectral envelope information comprises the steps of:
- a) forming a model set of the spectral magnitudes by assigning fixed values;
  
  b) synthesizing a model speech signal for the model set of the spectral magnitudes using both pitch frequencies and a set of voicing decisions determined for previous and current frames;
  
  c) calculating a spectrum of the model speech signal;
  
  d) approximating a spectrum of the initial speech signal by the spectrum of the model speech signal; and
  
  e) encoding coefficients obtained from the approximated spectrum.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. A method of claim 1, wherein in the step (a), the model set of the spectral magnitudes are formed separately for voiced and unvoiced parts of the model speech signal spectrum.
  - 3. A method of claim 2, wherein in the step a), a model set of the spectral magnitudes for the voiced part of the model speech signal spectrum is formed by assigning a fixed value equal to 1 during voiced bands and 0 otherwise.
  - 4. A method of claim 2, wherein in the step d), the voiced part of the model speech signal spectrum is approximated by position tuning a voiced excitation spectrum clip relatively to a frequency band position using a Least Square Method.
  - 5. A method of claim 2, wherein in the step b), the unvoiced part of the model speech signal spectrum is synthesized by producing a white noise signal of unit amplitude range and providing a synchronization property of the synthesis scheme.
  - 6. A method of claim 2, wherein in the step d), the unvoiced part of the model speech signal spectrum is approximated by an unvoiced excitation spectrum clip for every frequency band using a Least Square Method.

7. A hybrid method for spectral magnitudes encoding of each speech frame, comprising the steps of:
- a) reducing a number of spectral magnitudes;
  
  b) using different types of encoding schemes for simultaneously encoding the spectral magnitudes;
  
  c) evaluating the encoding schemes; and
  
  d) selecting from the evaluated encoding schemes the best encoding scheme for spectral magnitudes encoding as a base scheme.
- View Dependent Claims (8, 9)
- - 8. A method of claim 7, wherein in the step a), the number of the spectral magnitudes is reduced based upon a Wavelet Transform technique.
  - 9. A method of claim 8, wherein in the step b), the different types of encoding schemes include the Wavelet Transform technique and an inter-frame prediction.

10. A method for synthesizing a synthetic digital speech signal from a data structure produced by dividing an initial speech signal into a plurality of frames, determining a pitch frequency, determining voicing information, representing whether each of a plurality of frequency bands of each frame should be synthesized as voiced or unvoiced frequency bands, and processing the frames to determine spectral envelope information representative of the magnitudes of a spectrum in the frequency bands, wherein the method for synthesizing the synthetic digital speech signal comprises the steps of:
- a) building a frequency correspondence between bands of current and previous frames;
  
  b) synthesizing speech components for the voiced frequency bands for couples of harmonics with the closest frequencies in the current and previous frames utilizing the built bands'"'"' frequency correspondence and lacing the coupled harmonics, wherein all uncoupled harmonics of the previous frame are smoothly decreased down to zero amplitude and wherein all uncoupled harmonics of the current frame are smoothly increased up to their own amplitudes;
  
  c) synthesizing speech components for the unvoiced frequency bands; and
  
  d) synthesizing the synthetic digital speech signal by combining the synthesized speech components for the voiced and the unvoiced frequency bands.
- View Dependent Claims (11)
- - 11. A method of claim 10, wherein in the step a) the bands'"'"' frequency correspondence is built by forming direct and inverse maps of the frequency bands induced by the pitch frequency of the previous and current frames.

12. A system for speech signal coding and decoding, comprising a speech signal coder and a speech signal decoder, wherein the speech signal coder comprises:
- a processor dividing an input digital speech signal into a plurality of frames to be analyzed in time and frequency domains;
  
  an orthogonal transforming unit transforming each frame to provide spectral data on the frequency axis;
  
  a pitch determination unit determinating a pitch frequency for each frame;
  
  a voiced/unvoiced discrimination unit generating group voiced/unvoiced decisions utilizing the determined pitch frequencies;
  
  a spectral magnitudes determination unit estimating spectral magnitudes by utilizing an Analysis by Synthesis method; and
  
  a parameter encoding unit encoding the determined pitch frequency, the estimated spectral magnitude and the voiced/unvoiced decisions for each of the plurality of frames, and combining encoded data into a plurality of bits; and
  
  wherein the speech signal decoder comprises;
  
  a parameters decoding unit decoding the plurality of bits to provide the pitch frequency, spectral magnitudes and voiced/unvoiced decisions for each of the plurality of frames;
  
  a bands'"'"' frequency correspondence map building unit building a bands'"'"' frequency correspondence map between bands of current and previous frames; and
  
  a signal synthesizing unit synthesizing a speech signal from the pitch frequency, spectral magnitudes and voiced/unvoiced decision, and utilizing the bands'"'"' frequency correspondence map.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
- - 13. A system of claim 12, wherein the speech signal coder further comprises:
14. A system of claim 13, wherein the voiced/unvoiced discrimination unit utilizes an adaptive threshold depending on the assigned frame classification.
15. A system of claim 13, wherein the pitch determination unit comprises:
- a pitch candidates set determination unit determining a set of pitch candidates based upon an analysis of normalized auto-correlation function using either a direct or an inverse order depending on the assigned frame classification;
  
  a best candidate selection unit estimating the set of pitch candidates in the frequency domain and selecting the best candidate from the set of pitch candidates; and
  
  a best candidate refinement unit refining the selected best candidate in the frequency domain.
16. A system of claim 15, wherein the best candidate selection unit estimates the set of pitch candidates by a window function response scaled to obtain a predetermined sharpness of the window function in each band and to provide a final pitch candidate selection.
17. A system of claim 16, wherein the window function response is scaled for pitch frequencies lower than a predetermined frequency F_scale.
18. A system of claim 17, wherein the window function response is scaled by a procedure of proportional sharpening.
19. A system of claim 18, wherein the procedure of proportional sharpening is carried out by a linear interpolation.
20. A system of claim 19, wherein the window function responses scaled for different pitch frequencies are used as a look-up table.
21. A system of claim 12, wherein the parameter encoding unit further comprises:
- a scalar quantization unit quantizing a value of the pitch frequency;
  
  a spectral magnitudes wavelet reduction unit reducing a dimension of a spectral magnitude vector;
  
  a spectral magnitudes hybrid encoding unit encoding the reduced a spectral magnitudes vector by a wavelet technique; and
  
  a multiplexer unit combining the encoded data into a plurality of bits.
22. A system of claim 21, wherein the spectral magnitudes hybrid encoding unit comprises:
- a wavelet encoder unit encoding the reduced spectral magnitudes vector;
  
  an inter-frame prediction encoder unit encoding the reduced spectral magnitudes vector; and
  
  a comparator unit comparing the effectiveness of the wavelet encoder unit and the effectiveness of the inter-frame prediction encoder unit to select a better encoder unit, and outputting a decision bit and data corresponding to the selected better encoder unit to the multiplexer unit.
23. A system of claim 12, wherein the signal synthesizing unit comprises:
- a voice synthesizing unit synthesizing speech components for voiced frequency bands for couples of harmonics with the closest frequencies in the current and previous frames utilizing the built bands'"'"' frequency correspondence and lacing the coupled harmonics, wherein all uncoupled harmonics of the previous frame are smoothly decreased down to zero amplitude and wherein all uncoupled harmonics of the current frame are smoothly increased up to their own amplitudes;
  
  an unvoiced synthesis unit synthesizing speech components for unvoiced frequency bands; and
  
  an adder synthesizing the speech signal by summing the synthesized speech components for the voiced and the unvoiced frequency bands.
24. A system of claim 12, wherein the spectral magnitudes determination unit comprises:
- a bands'"'"' frequency correspondence map building unit building a frequency correspondence between bands of current and previous frames;
  
  a voiced synthesis unit synthesizing a model voiced signal for a model set of the spectral magnitudes based upon the built bands'"'"' frequency correspondence, the pitch frequency and the set of voicing decisions for the previous and current frames;
  
  a first windowing unit processing the model voiced signal;
  
  an orthogonal transforming unit transforming a model voiced signal windowed by the first windowing unit into a frequency domain;
  
  a voice magnitude evaluation unit evaluating voiced magnitudes of the transformed model voiced signal by a Least Square Method;
  
  a synchronized noise generator producing a model white noise signal with a unit amplitude range;
  
  a second windowing unit processing the model white noise signal;
  
  an orthogonal transforming unit transforming the model white noise signal windowed by the second windowing unit to a frequency domain; and
  
  an unvoiced magnitudes evaluation unit evaluating unvoiced magnitudes of the transformed model white noise signal by a Least Square Method.
25. A system of claim 24, wherein the voiced synthesis unit forms the model voiced signal for the model set of the spectral magnitudes by assigning fixed etalon values equal to 1 for voiced bands and 0 otherwise.
26. A system of claim 12, wherein the voiced/unvoiced discrimination unit generates the group voiced/unvoiced decisions utilizing a window function response scaled to obtain a predetermined sharpness of the window function in each band and to provide a final voiced/unvoiced decisions generation.
27. A system of claim 26, wherein the window function response is scaled for pitch frequencies lower than a predetermined frequency F_scale.
28. A system of claim 27, wherein the window function response is scaled by a procedure of proportional sharpening.
29. A system of claim 28, wherein the procedure of proportional sharpening is carried out by a linear interpolation.
30. A system of claim 29, wherein the window function responses scaled for different pitch frequencies are used as a look-up table.
31. A system of claim 30, wherein the voiced/unvoiced discrimination unit tunes a position of said scaled responses relative to the location of a frequency band peak.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
LG Electronics, Inc. (LG Corporation)
Original Assignee
LG Electronics, Inc. (LG Corporation)
Inventors
Maiboroda, Alexandr L., Redkov, Victor V., Djourinski, Eugene V., Tikhotski, Anatoli I.
Primary Examiner(s)
Chawan, Vijay
Assistant Examiner(s)
Storm, Donald L.

Application Number

US09/283,578
Time in Patent Office

1,356 Days
Field of Search

704/208, 704/221, 704/223, 704/220
US Class Current

704/220
CPC Class Codes

G10L 19/00   Speech or audio signals ana...

G10L 19/08   Determination or coding of ...

G10L 25/27   characterised by the analys...

Apparatus and method of speech coding and decoding using multiple frames

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

31 Claims

Specification

Solutions

Use Cases

Quick Links

Apparatus and method of speech coding and decoding using multiple frames

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

31 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links