Enhancement of multichannel audio
First Claim
Patent Images
1. A method for enhancing an audio signal, wherein the audio signal comprises two or more channels of audio content, the method comprising:
- examining a portion of the audio signal to determine whether the portion contains one or more characteristics of speech, and if the portion contains one or more characteristics of speech, classifying the portion as a speech portion, said examining including;
applying a first portion of the audio signal to a speech versus other sound (SVO) detector configured to generate, using one or more signal descriptors of the first portion of the audio signal, an SVO output indicating a likelihood estimate that the first portion of the audio signal contains speech, or indicating a hard speech/no-speech decision in the first portion of the audio signal,applying a second portion of the audio signal to a voice activity detector (VAD) operable to determine the presence of voice based on a sudden increase in power in the second portion of the audio signal, andbiasing a decision by the VAD based on the SVO output;
calculating a gain for the speech portion based at least in part on an estimated loudness associated with a previous speech portion of the audio signal;
smoothing the calculated gain to control the rate at which the calculated gain changes from the speech portion to a second portion of the audio signal; and
applying the smoothed gain to the audio signal.
1 Assignment
0 Petitions
Accused Products
Abstract
The invention relates to audio signal processing. More specifically, the invention relates to enhancing multichannel audio, such as television audio, by applying a gain to the audio that has been smoothed between portions of the audio. The invention relates to methods, apparatus for performing such methods, and to software stored on a computer-readable medium for causing a computer to perform such methods.
84 Citations
19 Claims
-
1. A method for enhancing an audio signal, wherein the audio signal comprises two or more channels of audio content, the method comprising:
-
examining a portion of the audio signal to determine whether the portion contains one or more characteristics of speech, and if the portion contains one or more characteristics of speech, classifying the portion as a speech portion, said examining including; applying a first portion of the audio signal to a speech versus other sound (SVO) detector configured to generate, using one or more signal descriptors of the first portion of the audio signal, an SVO output indicating a likelihood estimate that the first portion of the audio signal contains speech, or indicating a hard speech/no-speech decision in the first portion of the audio signal, applying a second portion of the audio signal to a voice activity detector (VAD) operable to determine the presence of voice based on a sudden increase in power in the second portion of the audio signal, and biasing a decision by the VAD based on the SVO output; calculating a gain for the speech portion based at least in part on an estimated loudness associated with a previous speech portion of the audio signal; smoothing the calculated gain to control the rate at which the calculated gain changes from the speech portion to a second portion of the audio signal; and applying the smoothed gain to the audio signal. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for enhancing an audio signal, wherein the audio signal comprises two or more channels of audio content, the system comprising:
- a controller that receives a first portion of the audio signal;
a detection module that determines whether the first portion contains characteristics of speech, and if the first portion is determined to contain characteristics of speech, identifies the first portion as a speech portion, said detection module including a speech-versus-other (SVO) detector applied to a first portion of the audio signal and configured to generate, using one or more signal descriptors of the first portion of the audio signal, an SVO output indicating a likelihood estimate that the first portion of the audio signal contains speech, or indicating a hard speech/no-speech decision in the first portion of the audio signal, the SVO driving a voice activity detector (VAD) applied to a second portion of the audio signal as a function of an output of the SVO, the VAD operable to determine the presence of voice based on a sudden increase in power in the second portion of the audio signal, said driving including biasing a decision by the VAD based on the SVO output; and
an enhancement processor that calculates a gain for the speech portion and smoothes the calculated gain to control the rate at which the gain changes from the speech portion to a second portion of the audio signal, the gain being calculated based at least in part on an estimated loudness associated with a previous speech portion of the audio signal. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- a controller that receives a first portion of the audio signal;
-
19. A method for signal processing, comprising:
-
receiving an audio signal, wherein the audio signal comprises two or more channels of audio content; analyzing features of the audio signal; classifying a portion of the audio signal as a speech portion if the portion contains one or more features of speech, said classifying including; applying a first portion of the audio signal to a speech versus other sound (SVO) detector configured to generate, using one or more signal descriptors of the first portion of the audio signal, an SVO output indicating a likelihood estimate that the first portion of the audio signal contains speech, or indicating a hard speech/no-speech decision in the first portion of the audio signal, and applying, a second portion of the audio signal to a voice activity detector (VAD) operable to determine the presence of voice based on a sudden increase in power in the second portion of the audio signal, and biasing a decision by the VAD based on the SVO output; calculating a gain for the speech portion based at least in part on an estimated loudness associated with a previous speech portion; and smoothing the calculated gain to control the rate at which the calculated gain changes from the speech portion to a second portion of the audio signal.
-
Specification