Hybrid waveform-coded and parametric-coded speech enhancement

US 10,141,004 B2
Filed: 08/27/2014
Issued: 11/27/2018
Est. Priority Date: 08/28/2013
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

receiving mixed audio content, in a reference audio channel representation, that are distributed over a plurality of audio channels of the reference audio channel representation, the mixed audio content having a mix of speech content and non-speech audio content;

transforming one or more portions of the mixed audio content that are distributed over two or more non-Mid/Side (non-M/S) channels in the plurality of audio channels of the reference audio channel representation into one or more portions of the transformed mixed audio content in an M/S audio channel representation that are distributed over one or more channels of the M/S audio channel representation, wherein the M/S audio channel representation comprises at least a mid-channel signal and a side-channel signal, wherein the mid-channel signal represents a weighted or non-weighted sum of two channels of the reference audio channel representation, and wherein the side-channel signal represents a weighted or non-weighted difference of two channels of the reference audio channel representation;

determining metadata for speech enhancement of the one or more portions of the transformed mixed audio content in the M/S audio channel representation, wherein a first type of speech enhancement is waveform-encoded speech enhancement of a reduced quality version of the mid-channel signal in the M/S audio channel representation, and a second type of speech enhancement is parametric-encoded speech enhancement of a reconstructed version of the mid-channel signal in the M/S audio channel representation, the metadata including a mid-channel prediction parameter to reconstruct the mid-channel signal, a first gain parameter for waveform-encoded speech enhancement of the mid-channel signal, and a second gain parameter for parametric-encoded speech enhancement of the reconstructed mid-channel signal; and

generating an audio signal that comprises the mixed audio content and the metadata for speech enhancement of the one or more portions of the transformed mixed audio content in the M/S audio channel representation;

wherein the method is performed by one or more computing devices.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for hybrid speech enhancement which employs parametric-coded enhancement (or blend of parametric-coded and waveform-coded enhancement) under some signal conditions and waveform-coded enhancement (or a different blend of parametric-coded and waveform-coded enhancement) under other signal conditions. Other aspects are methods for generating a bitstream indicative of an audio program including speech and other content, such that hybrid speech enhancement can be performed on the program, a decoder including a buffer which stores at least one segment of an encoded audio bitstream generated by any embodiment of the inventive method, and a system or device (e.g., an encoder or decoder) configured (e.g., programmed) to perform any embodiment of the inventive method. At least some of speech enhancement operations are performed by a recipient audio decoder with Mid/Side speech enhancement metadata generated by an upstream audio encoder.

150 Citations

12 Claims

1. A method, comprising:
- receiving mixed audio content, in a reference audio channel representation, that are distributed over a plurality of audio channels of the reference audio channel representation, the mixed audio content having a mix of speech content and non-speech audio content;
  
  transforming one or more portions of the mixed audio content that are distributed over two or more non-Mid/Side (non-M/S) channels in the plurality of audio channels of the reference audio channel representation into one or more portions of the transformed mixed audio content in an M/S audio channel representation that are distributed over one or more channels of the M/S audio channel representation, wherein the M/S audio channel representation comprises at least a mid-channel signal and a side-channel signal, wherein the mid-channel signal represents a weighted or non-weighted sum of two channels of the reference audio channel representation, and wherein the side-channel signal represents a weighted or non-weighted difference of two channels of the reference audio channel representation;
  
  determining metadata for speech enhancement of the one or more portions of the transformed mixed audio content in the M/S audio channel representation, wherein a first type of speech enhancement is waveform-encoded speech enhancement of a reduced quality version of the mid-channel signal in the M/S audio channel representation, and a second type of speech enhancement is parametric-encoded speech enhancement of a reconstructed version of the mid-channel signal in the M/S audio channel representation, the metadata including a mid-channel prediction parameter to reconstruct the mid-channel signal, a first gain parameter for waveform-encoded speech enhancement of the mid-channel signal, and a second gain parameter for parametric-encoded speech enhancement of the reconstructed mid-channel signal; and
  
  generating an audio signal that comprises the mixed audio content and the metadata for speech enhancement of the one or more portions of the transformed mixed audio content in the M/S audio channel representation;
  
  wherein the method is performed by one or more computing devices.
- View Dependent Claims (2, 3, 4, 5, 6, 9, 10)
- - 2. The method of claim 1, wherein the mixed audio content is in a non-M/S audio channel representation.
  - 3. The method of claim 1, further comprising:
    - generating a version of the speech content, in the M/S audio channel representation, separate from the mixed audio content; and
      
      outputting the audio signal encoded with the version of the speech content in the M/S audio channel representation.
  - 4. The method of claim 3, further comprising:
    - generating blend indicating data indicating a specific quantitative combination of the first and second types of speech enhancement to be generated by a recipient audio decoder; and
      
      outputting the audio signal encoded with the blend indicating data.
  - 5. The method of claim 4, wherein the blend indicating data is generated based at least in part on one or more signal-to-noise (SNR) values for the one or more portions of the transformed mixed audio content in the M/S audio channel representation, wherein the one or more SNR values represents one or more of ratios of power of speech content and non-speech audio content of the one or more portions of the transformed mixed audio content in the M/S audio channel representation, or ratios of power of speech content and total audio content of the one or more portions of the transformed mixed audio content in the M/S audio channel representation.
  - 6. The method of claim 4, wherein the specific quantitative combination of the first and second types of speech enhancement is determined with an auditory masking model in which the first type of speech enhancement represents a greatest relative amount of speech enhancement in a plurality of combinations of the first and second types of speech enhancement that ensures that coding noise in an output speech-enhanced audio program is not objectionably audible.
  - 9. An apparatus comprising a processor and configured to perform the method recited in claim 1.
  - 10. A non-transitory computer readable storage medium, comprising software instructions, which when executed by one or more processors cause performance of the method recited in claim 1.

7. A method, comprising:
- receiving an audio signal that comprises mixed audio content in a reference audio channel representation and metadata for speech enhancement, the mixed audio content having a mix of speech content and non-speech audio content;
  
  transforming one or more portions of the mixed audio content that spread over two or more non-M/S channels in a plurality of audio channels of the reference audio channel representation into one or more portions of transformed mixed audio content in an M/S audio channel representation that spread over one or more M/S channels of the M/S audio channel representation, wherein the M/S audio channel representation comprises at least a mid-channel signal and a side-channel signal, wherein the mid-channel signal represents a weighted or non-weighted sum of two channels of the reference audio channel representation, and wherein the side-channel signal represents a weighted or non-weighted difference of two channels of the reference audio channel representation;
  
  determining metadata for speech enhancement of the one or more portions of the transformed mixed audio content in the M/S audio channel representation, wherein a first type of speech enhancement is waveform-encoded speech enhancement of a reduced quality version of the mid-channel signal in the M/S audio channel representation, and a second type of speech enhancement is parametric-encoded speech enhancement of a reconstructed version of the mid-channel signal in the M/S audio channel representation, the metadata including a mid-channel prediction parameter to reconstruct the mid-channel signal, a first gain parameter for waveform-encoded speech enhancement of the mid-channel signal, and a second gain parameter for parametric-encoded speech enhancement of the reconstructed mid-channel signal;
  
  performing one or more speech enhancement operations, based on the metadata for speech enhancement, on the one or more portions of the transformed mixed audio content in the M/S audio channel representation to generate one or more portions of enhanced speech content in the M/S representation;
  
  combining the one or more portions of the transformed mixed audio content in the M/S audio channel representation with the one or more portions of the enhanced speech content in the M/S representation to generate one or more portions of speech enhanced mixed audio content in the M/S representation;
  
  wherein the method is performed by one or more computing devices.
- View Dependent Claims (8, 11, 12)
- - 8. The method of claim 7, wherein the one or more speech enhancement operations are represented by a single matrix.
  - 11. An apparatus comprising a processor and configured to perform the method recited in claim 7.
  - 12. A non-transitory computer readable storage medium, comprising software instructions, which when executed by one or more processors cause performance of the method recited in claim 7.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Dolby International AB (Dolby Laboratories Incorporated), Dolby Laboratories Licensing Corporation (Dolby Laboratories Incorporated)
Original Assignee
Dolby International AB (Dolby Laboratories Incorporated), Dolby Laboratories Licensing Corporation (Dolby Laboratories Incorporated)
Inventors
Koppens, Jeroen, Muesch, Hannes
Primary Examiner(s)
Patel, Yogeshkumar

Application Number

US14/914,572
Publication Number

US 20160225387A1
Time in Patent Office

1,553 Days
Field of Search

7042701, 704E11001, 704E15039, 704E21004, 704228, 381 80, 381119, 381 942, 381 22, 381 23
US Class Current
CPC Class Codes

G10L 19/008   Multichannel audio signal c...

G10L 19/20   using sound class specific ...

G10L 19/22   Mode decision, i.e. based o...

G10L 21/0324   Details of processing therefor

G10L 21/0364   for improving intelligibility

H04R 5/04   Circuit arrangements, e.g. ...

H04S 2400/15   Aspects of sound capture an...

H04S 2420/03   Application of parametric c...

H04S 3/008   in which the audio signals ...

Hybrid waveform-coded and parametric-coded speech enhancement

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

150 Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

Hybrid waveform-coded and parametric-coded speech enhancement

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

150 Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links