HYBRID WAVEFORM-CODED AND PARAMETRIC-CODED SPEECH ENHANCEMENT

US 20160225387A1
Filed: 08/27/2014
Published: 08/04/2016
Est. Priority Date: 08/28/2013
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

receiving mixed audio content, in a reference audio channel representation, that are distributed over a plurality of audio channels of the reference audio channel representation, the mixed audio content having a mix of speech content and non-speech audio content;

transforming one or more portions of the mixed audio content that are distributed over two or more non-Mid/Side (non-M/S) channels in the plurality of audio channels of the reference audio channel representation into one or more portions of transformed mixed audio content in an M/S audio channel representation that are distributed over one or more channels of the M/S audio channel representation, wherein the M/S audio channel representation comprises at least a mid-channel and a side-channel, wherein the mid-channel represents a weighted or non-weighted sum of two channels of the reference audio channel representation, and wherein the side-channel represents a weighted or non-weighted difference of two channels of the reference audio channel representation;

determining metadata for speech enhancement of the one or more portions of transformed mixed audio content in the M/S audio channel representation; and

generating an audio signal that comprises the mixed audio content and the metadata for speech enhancement of the one or more portions of transformed mixed audio content in the M/S audio channel representation;

wherein the method is performed by one or more computing devices.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for hybrid speech enhancement which employs parametric-coded enhancement (or blend of parametric-coded and waveform-coded enhancement) under some signal conditions and waveform-coded enhancement (or a different blend of parametric-coded and waveform-coded enhancement) under other signal conditions. Other aspects are methods for generating a bitstream indicative of an audio program including speech and other content, such that hybrid speech enhancement can be performed on the program, a decoder including a buffer which stores at least one segment of an encoded audio bitstream generated by any embodiment of the inventive method, and a system or device (e.g., an encoder or decoder) configured (e.g., programmed) to perform any embodiment of the inventive method. At least some of speech enhancement operations are performed by a recipient audio decoder with Mid/Side speech enhancement metadata generated by an upstream audio encoder.

163 Citations

34 Claims

1. A method, comprising:
- receiving mixed audio content, in a reference audio channel representation, that are distributed over a plurality of audio channels of the reference audio channel representation, the mixed audio content having a mix of speech content and non-speech audio content;
  
  transforming one or more portions of the mixed audio content that are distributed over two or more non-Mid/Side (non-M/S) channels in the plurality of audio channels of the reference audio channel representation into one or more portions of transformed mixed audio content in an M/S audio channel representation that are distributed over one or more channels of the M/S audio channel representation, wherein the M/S audio channel representation comprises at least a mid-channel and a side-channel, wherein the mid-channel represents a weighted or non-weighted sum of two channels of the reference audio channel representation, and wherein the side-channel represents a weighted or non-weighted difference of two channels of the reference audio channel representation;
  
  determining metadata for speech enhancement of the one or more portions of transformed mixed audio content in the M/S audio channel representation; and
  
  generating an audio signal that comprises the mixed audio content and the metadata for speech enhancement of the one or more portions of transformed mixed audio content in the M/S audio channel representation;
  
  wherein the method is performed by one or more computing devices.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 33, 34)
- - 2. The method of claim 1, wherein the mixed audio content is in a non-M/S audio channel representation.
  - 3. The method of claim 1, further comprising:
    - generating a version of the speech content, in the M/S audio channel representation, separate from the mixed audio content; and
      
      outputting the audio signal encoded with the version of the speech content in the M/S audio channel representation.
  - 4. The method of claim 3, further comprising:
    - generating blend indicating data indicating a specific quantitative combination of first and second types of speech enhancement to be generated by a recipient audio decoder, wherein the first type of speech enhancement is speech enhancement based on the version of the speech content in the M/S audio channel representation, and wherein the second type of speech enhancement is parametric speech enhancement based on a reconstructed version of the speech content in the M/S audio channel representation; and
      
      outputting the audio signal encoded with the blend indicating data.
  - 5. The method of claim 4, wherein at least a portion of the metadata for speech enhancement enables a recipient audio decoder to reconstruct the reconstructed version of the speech content in the M/S representation from the mixed audio content in the reference audio channel representation.
  - 6. The method of claim 4, wherein the blend indicating data is generated based at least in part on one or more SNR values for the one or more portions of transformed mixed audio content in the M/S audio channel representation, wherein the one or more SNR values represents one or more of ratios of power of speech content and non-speech audio content of the one or more portions of transformed mixed audio content in the M/S audio channel representation, or ratios of power of speech content and total audio content of the one or more portions of transformed mixed audio content in the M/S audio channel representation.
  - 7. The method of claim 4, wherein the specific quantitative combination of the first and second types of speech enhancement is determined with an auditory masking model in which the first type of speech enhancement represents a greatest relative amount of speech enhancement in a plurality of combinations of the first and second types of speech enhancement that ensures that coding noise in an output speech-enhanced audio program is not objectionably audible.
  - 8. The method of claim 1, wherein at least a portion of the metadata for speech enhancement enables a recipient audio decoder to reconstruct a version of the speech content in the M/S representation from the mixed audio content in the reference audio channel representation.
  - 9. The method of claim 1, wherein the metadata for speech enhancement comprises metadata relating to one or more of speech enhancement operations in the M/S audio channel representation based on the version of the speech content, or parametric speech enhancement operations in the M/S audio channel representation.
  - 33. An apparatus comprising a processor and configured to perform any one of the methods recited in claim 1.
  - 34. A non-transitory computer readable storage medium, comprising software instructions, which when executed by one or more processors cause performance of any one of the methods recited in claim 1.

10-16. -16. (canceled)

17. A method, comprising:
- receiving an audio signal that comprises mixed audio content in a reference audio channel representation and metadata for speech enhancement, the mixed audio content having a mix of speech content and non-speech audio content;
  
  transforming one or more portions of the mixed audio content that spread over two or more non-M/S channels in a plurality of audio channels of the reference audio channel representation into one or more portions of transformed mixed audio content in an M/S audio channel representation that spread over one or more M/S channels of the M/S audio channel representation, wherein the M/S audio channel representation comprises at least a mid-channel and a side-channel, wherein the mid-channel represents a weighted or non-weighted sum of two channels of the reference audio channel representation, and wherein the side-channel represents a weighted or non-weighted difference of two channels of the reference audio channel representation;
  
  performing one or more speech enhancement operations, based on the metadata for speech enhancement, on the one or more portions of transformed mixed audio content in the M/S audio channel representation to generate one or more portions of enhanced speech content in the M/S representation;
  
  combining the one or more portions of transformed mixed audio content in the M/S audio channel representation with the one or more portions of enhanced speech content in the M/S representation to generate one or more portions of speech enhanced mixed audio content in the M/S representation;
  
  wherein the method is performed by one or more computing devices.
- View Dependent Claims (18)
- - 18. The method of claim 17, wherein the steps of transforming, performing and combining are implemented in a single operation that is performed on the one or more portions of the mixed audio content that spread over two or more non-M/S channels in the plurality of audio channels of the reference audio channel representation.

19-32. -32. (canceled)

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Dolby International AB (Dolby Laboratories Incorporated), Dolby Laboratories Licensing Corporation (Dolby Laboratories Incorporated)
Original Assignee
Dolby International AB (Dolby Laboratories Incorporated), Dolby Laboratories Licensing Corporation (Dolby Laboratories Incorporated)
Inventors
KOPPENS, Jeroen, MUESCH, Hannes

Granted Patent

US 10,141,004 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G10L 19/008   Multichannel audio signal c...

G10L 19/20   using sound class specific ...

G10L 19/22   Mode decision, i.e. based o...

G10L 21/0324   Details of processing therefor

G10L 21/0364   for improving intelligibility

H04R 5/04   Circuit arrangements, e.g. ...

H04S 2400/15   Aspects of sound capture an...

H04S 2420/03   Application of parametric c...

H04S 3/008   in which the audio signals ...

HYBRID WAVEFORM-CODED AND PARAMETRIC-CODED SPEECH ENHANCEMENT

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

163 Citations

34 Claims

Specification

Use Cases

Quick Links

Others

HYBRID WAVEFORM-CODED AND PARAMETRIC-CODED SPEECH ENHANCEMENT

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

163 Citations

34 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others