Parametric speech codec for representing synthetic speech in the presence of background noise
First Claim
1. A system for processing an encoded audio signal having a number of frames, the system comprising:
- a decoder comprising;
means for unquantizing at least three of a pitch period, a voicing probability, a mid-frame pitch period, and a mid-frame voicing probability of the audio signal;
means for producing a spectral magnitude envelope and a minimum phase envelope;
means for generating at least one control parameter using a signal-to-noise ratio computed using a gain and the voicing probability of the audio signal;
means for analyzing the spectral magnitude envelope and the minimum phase envelope, wherein the spectral magnitude envelope and the minimum phase envelope are analyzed using the at least one control parameter and at least one of the unquantized pitch period, the unquantized voicing probability, the unquantized mid-frame pitch period, and the unquantized mid-frame voicing probability; and
means for producing a synthetic speech signal corresponding to the input audio signal using the analysis of the spectral magnitude envelope and the minimum phase envelope.
0 Assignments
0 Petitions
Accused Products
Abstract
A system and method are provided for processing audio and speech signals using a pitch and voicing dependent spectral estimation algorithm (voicing algorithm) to accurately represent voiced speech, unvoiced speech, and mixed speech in the presence of background noise, and background noise with a single model. The present invention also modifies the synthesis model based on an estimate of the current input signal to improve the perceptual quality of the speech and background noise under a variety of input conditions. The present invention also improves the voicing dependent spectral estimation algorithm robustness by introducing the use of a Multi-Layer Neural Network in the estimation process. The voicing dependent spectral estimation algorithm provides an accurate and robust estimate of the voicing probability under a variety of background noise conditions. This is essential to providing high quality intelligible speech in the presence of background noise.
37 Citations
4 Claims
-
1. A system for processing an encoded audio signal having a number of frames, the system comprising:
a decoder comprising; means for unquantizing at least three of a pitch period, a voicing probability, a mid-frame pitch period, and a mid-frame voicing probability of the audio signal; means for producing a spectral magnitude envelope and a minimum phase envelope; means for generating at least one control parameter using a signal-to-noise ratio computed using a gain and the voicing probability of the audio signal; means for analyzing the spectral magnitude envelope and the minimum phase envelope, wherein the spectral magnitude envelope and the minimum phase envelope are analyzed using the at least one control parameter and at least one of the unquantized pitch period, the unquantized voicing probability, the unquantized mid-frame pitch period, and the unquantized mid-frame voicing probability; and means for producing a synthetic speech signal corresponding to the input audio signal using the analysis of the spectral magnitude envelope and the minimum phase envelope. - View Dependent Claims (2, 3, 4)
Specification