Audio classification based on perceptual quality for low or medium bit rates
First Claim
1. A method for encoding signals, the method comprising:
- receiving, by an audio encoder, a digital signal comprising audio data, wherein the audio data includes data of speech and non-speech sounds;
classifying, by the audio encoder, the digital signal as an AUDIO signal based on the audio data in the digital signal;
determining, by the audio encoder, whether classifying conditions are satisfied, wherein the classifying conditions include;
pitch differences between sub-frames in the digital signal are less than a first threshold, a coding rate of the digital signal is below a second threshold, an average normalized pitch correlation value for the sub-frames in the digital signal is greater than a third threshold and a smoothed pitch correlation obtained according to the average normalized pitch correlation value is greater than a fourth threshold, wherein each of the pitch differences is an absolute value of the difference between two pitch values corresponding to two sub-frames respectively;
re-classifying, by the audio encoder, the digital signal as a VOICED signal when the classifying conditions are satisfied;
encoding, by the audio encoder, the digital signal in the time-domain if the digital signal is classified as a VOICED signal; and
encoding, by the audio encoder, the digital signal in the frequency-domain if the digital signal is classified as an AUDIO signal.
1 Assignment
0 Petitions
Accused Products
Abstract
The quality of encoded signals can be improved by reclassifying AUDIO signals carrying non-speech data as VOICE signals when periodicity parameters of the signal satisfy one or more criteria. In some embodiments, only low or medium bit rate signals are considered for re-classification. The periodicity parameters can include any characteristic or set of characteristics indicative of periodicity. For example, the periodicity parameter may include pitch differences between subframes in the audio signal, a normalized pitch correlation for one or more subframes, an average normalized pitch correlation for the audio signal, or combinations thereof. Audio signals which are re-classified as VOICED signals may be encoded in the time-domain, while audio signals that remain classified as AUDIO signals may be encoded in the frequency-domain.
-
Citations
14 Claims
-
1. A method for encoding signals, the method comprising:
-
receiving, by an audio encoder, a digital signal comprising audio data, wherein the audio data includes data of speech and non-speech sounds; classifying, by the audio encoder, the digital signal as an AUDIO signal based on the audio data in the digital signal; determining, by the audio encoder, whether classifying conditions are satisfied, wherein the classifying conditions include;
pitch differences between sub-frames in the digital signal are less than a first threshold, a coding rate of the digital signal is below a second threshold, an average normalized pitch correlation value for the sub-frames in the digital signal is greater than a third threshold and a smoothed pitch correlation obtained according to the average normalized pitch correlation value is greater than a fourth threshold, wherein each of the pitch differences is an absolute value of the difference between two pitch values corresponding to two sub-frames respectively;re-classifying, by the audio encoder, the digital signal as a VOICED signal when the classifying conditions are satisfied; encoding, by the audio encoder, the digital signal in the time-domain if the digital signal is classified as a VOICED signal; and encoding, by the audio encoder, the digital signal in the frequency-domain if the digital signal is classified as an AUDIO signal. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. An audio encoder comprising:
-
at least one processor; and a computer readable storage medium storing programming for execution by the at least one processor, the programming including instructions to; receive a digital signal comprising audio data, wherein the audio data includes data of speech and non-speech sounds; classify the digital signal as an AUDIO signal based on the audio data in the digital signal; determine whether classifying conditions are satisfied, wherein, the classifying conditions include;
pitch differences between sub-frames in the digital signal are less than a first threshold, a coding rate of the digital signal is below a second threshold, an average normalized pitch correlation value for the sub-frames in the digital signal is greater than a third threshold and a smoothed pitch correlation obtained according to the average normalized pitch correlation value is greater than a fourth threshold;
wherein, each of the pitch differences is an absolute value of the difference between two pitch values corresponding to two sub-frames respectively;re-classify the digital signal as a VOICED signal when the classifying conditions are satisfied; encode the digital signal in the time-domain if the digital signal is classified as a VOICED signal; and encode the digital signal in the frequency-domain if the digital signal is classified as an AUDIO signal. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
Specification