Enhancement of multichannel audio

US 8,972,250 B2
Filed: 08/10/2012
Issued: 03/03/2015
Est. Priority Date: 02/26/2007
Status: Active Grant

First Claim

Patent Images

1. A method for enhancing an audio signal, wherein the audio signal comprises two or more channels of audio content, the method comprising:

examining a portion of the audio signal to determine whether the portion contains one or more characteristics of speech, and if the portion contains one or more characteristics of speech, classifying the portion as a speech portion, said examining including;

applying a first portion of the audio signal to a speech versus other sound (SVO) detector configured to generate, using one or more signal descriptors of the first portion of the audio signal, an SVO output indicating a likelihood estimate that the first portion of the audio signal contains speech, or indicating a hard speech/no-speech decision in the first portion of the audio signal,applying a second portion of the audio signal to a voice activity detector (VAD) operable to determine the presence of voice based on a sudden increase in power in the second portion of the audio signal, andbiasing a decision by the VAD based on the SVO output;

calculating a gain for the speech portion based at least in part on an estimated loudness associated with a previous speech portion of the audio signal;

smoothing the calculated gain to control the rate at which the calculated gain changes from the speech portion to a second portion of the audio signal; and

applying the smoothed gain to the audio signal.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention relates to audio signal processing. More specifically, the invention relates to enhancing multichannel audio, such as television audio, by applying a gain to the audio that has been smoothed between portions of the audio. The invention relates to methods, apparatus for performing such methods, and to software stored on a computer-readable medium for causing a computer to perform such methods.

84 Citations

View as Search Results

19 Claims

1. A method for enhancing an audio signal, wherein the audio signal comprises two or more channels of audio content, the method comprising:
- examining a portion of the audio signal to determine whether the portion contains one or more characteristics of speech, and if the portion contains one or more characteristics of speech, classifying the portion as a speech portion, said examining including;
  
  applying a first portion of the audio signal to a speech versus other sound (SVO) detector configured to generate, using one or more signal descriptors of the first portion of the audio signal, an SVO output indicating a likelihood estimate that the first portion of the audio signal contains speech, or indicating a hard speech/no-speech decision in the first portion of the audio signal,applying a second portion of the audio signal to a voice activity detector (VAD) operable to determine the presence of voice based on a sudden increase in power in the second portion of the audio signal, andbiasing a decision by the VAD based on the SVO output;
  
  calculating a gain for the speech portion based at least in part on an estimated loudness associated with a previous speech portion of the audio signal;
  
  smoothing the calculated gain to control the rate at which the calculated gain changes from the speech portion to a second portion of the audio signal; and
  
  applying the smoothed gain to the audio signal.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1 wherein the applying the smoothed gain creates a substantially uniform perceived loudness between at least two speech portions of the audio signal.
  - 3. The method of claim 1 wherein the one or more characteristics of speech includes a speech frequency band.
  - 4. The method of claim 1 wherein the one or more characteristics of speech includes interchannel phase difference.
  - 5. The method of claim 1 wherein the one or more characteristics of speech includes correlation.
  - 6. The method of claim 1 wherein the portion comprises one or more blocks of the audio signal.
  - 7. A non-transitory computer-readable storage medium encoded with a computer program for causing a computer to perform the method of claim 1.

8. A system for enhancing an audio signal, wherein the audio signal comprises two or more channels of audio content, the system comprising:
- a controller that receives a first portion of the audio signal;
  
  a detection module that determines whether the first portion contains characteristics of speech, and if the first portion is determined to contain characteristics of speech, identifies the first portion as a speech portion, said detection module including a speech-versus-other (SVO) detector applied to a first portion of the audio signal and configured to generate, using one or more signal descriptors of the first portion of the audio signal, an SVO output indicating a likelihood estimate that the first portion of the audio signal contains speech, or indicating a hard speech/no-speech decision in the first portion of the audio signal, the SVO driving a voice activity detector (VAD) applied to a second portion of the audio signal as a function of an output of the SVO, the VAD operable to determine the presence of voice based on a sudden increase in power in the second portion of the audio signal, said driving including biasing a decision by the VAD based on the SVO output; and
  
  an enhancement processor that calculates a gain for the speech portion and smoothes the calculated gain to control the rate at which the gain changes from the speech portion to a second portion of the audio signal, the gain being calculated based at least in part on an estimated loudness associated with a previous speech portion of the audio signal.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 9. The system of claim 8 wherein the first portion comprises a block of the audio signal.
  - 10. The system of claim 8 wherein the first portion comprises a frame of the audio signal.
  - 11. The system of claim 8 wherein the two or more channels are processed independently of each other.
  - 12. The system of claim 8 wherein the enhancement processor operates in accordance with one or more processing parameters and adjustment of the parameters is operative to urge a metric of speech intelligibility of the audio content above a desired threshold level.
  - 13. The system of claim 8 wherein the enhancement processor calculates the gain based in part on the level of noise in the speech portion.
  - 14. The system of claim 8 wherein the enhancement processor is operative to perform an enhancement operation selected from the group consisting of dynamic range control, dynamic equalization, dynamic gain modification, spectral sharpening, speech extraction, and noise reduction.
  - 15. The system of claim 8 wherein the system is implemented in one of an audio decoder, an audio encoder, and a non-transitory computer-readable storage medium.
  - 16. The system of claim 8 wherein the first portion comprises a fixed quantity of audio samples of the audio signal.
  - 17. The system of claim 8 wherein the first portion and the second portion are from the same audio channel.
  - 18. The system of claim 8 wherein the system is operative to generate an output audio stream with a substantially constant perceived loudness of speech despite loudness level changes in the audio signal.

19. A method for signal processing, comprising:
- receiving an audio signal, wherein the audio signal comprises two or more channels of audio content;
  
  analyzing features of the audio signal;
  
  classifying a portion of the audio signal as a speech portion if the portion contains one or more features of speech, said classifying including;
  
  applying a first portion of the audio signal to a speech versus other sound (SVO) detector configured to generate, using one or more signal descriptors of the first portion of the audio signal, an SVO output indicating a likelihood estimate that the first portion of the audio signal contains speech, or indicating a hard speech/no-speech decision in the first portion of the audio signal, andapplying, a second portion of the audio signal to a voice activity detector (VAD) operable to determine the presence of voice based on a sudden increase in power in the second portion of the audio signal, andbiasing a decision by the VAD based on the SVO output;
  
  calculating a gain for the speech portion based at least in part on an estimated loudness associated with a previous speech portion; and
  
  smoothing the calculated gain to control the rate at which the calculated gain changes from the speech portion to a second portion of the audio signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Dolby Laboratories Licensing Corporation (Dolby Laboratories Incorporated)
Original Assignee
Dolby Laboratories Licensing Corporation (Dolby Laboratories Incorporated)
Inventors
Muesch, Hannes
Primary Examiner(s)
Godbold, Douglas

Application Number

US13/571,344
Publication Number

US 20120310635A1
Time in Patent Office

935 Days
Field of Search

704224-230
US Class Current

704/225
CPC Class Codes

G10L 19/012   Comfort noise or silence co...

G10L 19/018   Audio watermarking, i.e. em...

G10L 2025/932   Decision in previous or fol...

G10L 2025/937   Signal energy in various fr...

G10L 21/02   Speech enhancement, e.g. no...

G10L 21/0364   for improving intelligibility

G10L 25/78   Detection of presence or ab...

G10L 25/93   Discriminating between voic...

Enhancement of multichannel audio

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

84 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Enhancement of multichannel audio

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

84 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links