Methods and apparatus for decoding based on speech enhancement metadata

US 10,607,629 B2
Filed: 10/22/2018
Issued: 03/31/2020
Est. Priority Date: 08/28/2013
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

receiving mixed audio content, wherein the mixed audio content includes at least a mid-channel mixed content signal and a side-channel mixed content signal, wherein the mid-channel signal represents a weighted or non-weighted sum of two channels of a reference audio channel representation, and wherein the side-channel signal represents a weighted or non-weighted difference of two channels of the reference audio channel representation;

decoding, by an audio decoder, the mid-channel signal and the side-channel signal into a left channel signal and a right channel signal, wherein the decoding includes decoding based on speech enhancement metadata, wherein the speech enhancement metadata includes a preference flag which indicates at least a type of speech enhancement operation to be performed on the mid-channel signal and the side-channel signal during decoding, and wherein the enhancement metadata further indicates a first type of speech enhancement for the mid-channel signal and a second type of speech enhancement of the mid-channel signal; and

generating an audio signal that comprises the left channel signal and the right channel signal for the one or more portions of the decoded mid channel signal and side-channel signal of the mixed audio content,wherein the method is performed by one or more computing devices.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for hybrid speech enhancement which employs parametric-coded enhancement (or blend of parametric-coded and waveform-coded enhancement) under some signal conditions and waveform-coded enhancement (or a different blend of parametric-coded and waveform-coded enhancement) under other signal conditions. Other aspects are methods for generating a bitstream indicative of an audio program including speech and other content, such that hybrid speech enhancement can be performed on the program, a decoder including a buffer which stores at least one segment of an encoded audio bitstream generated by any embodiment of the inventive method, and a system or device (e.g., an encoder or decoder) configured (e.g., programmed) to perform any embodiment of the inventive method. At least some of speech enhancement operations are performed by a recipient audio decoder with Mid/Side speech enhancement metadata generated by an upstream audio encoder.

152 Citations

15 Claims

1. A method, comprising:
- receiving mixed audio content, wherein the mixed audio content includes at least a mid-channel mixed content signal and a side-channel mixed content signal, wherein the mid-channel signal represents a weighted or non-weighted sum of two channels of a reference audio channel representation, and wherein the side-channel signal represents a weighted or non-weighted difference of two channels of the reference audio channel representation;
  
  decoding, by an audio decoder, the mid-channel signal and the side-channel signal into a left channel signal and a right channel signal, wherein the decoding includes decoding based on speech enhancement metadata, wherein the speech enhancement metadata includes a preference flag which indicates at least a type of speech enhancement operation to be performed on the mid-channel signal and the side-channel signal during decoding, and wherein the enhancement metadata further indicates a first type of speech enhancement for the mid-channel signal and a second type of speech enhancement of the mid-channel signal; and
  
  generating an audio signal that comprises the left channel signal and the right channel signal for the one or more portions of the decoded mid channel signal and side-channel signal of the mixed audio content,wherein the method is performed by one or more computing devices.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the speech enhancement metadata comprises metadata relating to one or more of waveform-coded speech enhancement operations, or parametric speech enhancement operations.
  - 3. The method of claim 1, wherein the mixed audio content includes a reference audio channel representation that comprises audio channels relating to surround speakers.
  - 4. The method of claim 1, wherein the speech enhancement metadata comprises a single set of speech enhancement metadata relating to the mid-channel signal.
  - 5. The method of claim 1, wherein the speech enhancement metadata represents a part of overall audio metadata of the mixed audio content.
  - 6. The method of claim 1, wherein audio metadata encoded in the mixed audio content, comprises a data field to indicate a presence of the speech enhancement metadata.
  - 7. The method of claim 1, wherein the mixed audio content is a part of an audiovisual signal.

8. A non-transitory computer readable storage medium, comprising software instructions, which when executed by one or more processors cause performance of any one of the methods recited in 1-7.

9. An apparatus, comprising:
- a receiver configured to receive mixed audio content, wherein the mixed audio content includes at least a mid-channel mixed content signal and a side-channel mixed content signal, wherein the mid-channel signal represents a weighted or non-weighted sum of two channels of a reference audio channel representation, and wherein the side-channel signal represents a weighted or non-weighted difference of two channels of the reference audio channel representation;
  
  a decoder configured to decode the mid-channel signal and the side-channel signal into a left channel signal and a right channel signal, wherein the decoding includes decoding based on speech enhancement metadata, wherein the speech enhancement metadata includes a preference flag which indicates at least a type of speech enhancement operation to be performed on the mid-channel signal and the side-channel signal during decoding, and wherein the enhancement metadata further indicates a first type of speech enhancement for the mid-channel signal and a second type of speech enhancement of the mid-channel signal; and
  
  a processor configured to generate an audio signal that comprises the left channel signal and the right channel signal for the one or more portions of the decoded mid channel signal and side-channel signal of the mixed audio content.
- View Dependent Claims (10, 11, 12, 13, 14, 15)
- - 10. The apparatus of claim 9, wherein the speech enhancement metadata comprises metadata relating to one or more of waveform-coded speech enhancement operations, or parametric speech enhancement operations.
  - 11. The apparatus of claim 9, wherein the mixed audio content includes a reference audio channel representation that comprises audio channels relating to surround speakers.
  - 12. The apparatus of claim 9, wherein the speech enhancement metadata comprises a single set of speech enhancement metadata relating to the mid-channel signal.
  - 13. The apparatus of claim 9, wherein the speech enhancement metadata represents a part of overall audio metadata of the mixed audio content.
  - 14. The apparatus of claim 9, wherein audio metadata encoded in the mixed audio content, comprises a data field to indicate a presence of the speech enhancement metadata.
  - 15. The apparatus of claim 9, wherein the mixed audio content is a part of an audiovisual signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Dolby International AB (Dolby Laboratories Incorporated), Dolby Laboratories Licensing Corporation (Dolby Laboratories Incorporated)
Original Assignee
Dolby Laboratories Licensing Corporation (Dolby Laboratories Incorporated)
Inventors
Koppens, Jeroen, Muesch, Hannes
Primary Examiner(s)
Patel, Yogeshkumar

Application Number

US16/167,373
Publication Number

US 20190057713A1
Time in Patent Office

526 Days
Field of Search
US Class Current
CPC Class Codes

G10L 19/008   Multichannel audio signal c...

G10L 19/20   using sound class specific ...

G10L 19/22   Mode decision, i.e. based o...

G10L 21/0324   Details of processing therefor

G10L 21/0364   for improving intelligibility

H04R 5/04   Circuit arrangements, e.g. ...

H04S 2400/15   Aspects of sound capture an...

H04S 2420/03   Application of parametric c...

H04S 3/008   in which the audio signals ...

Methods and apparatus for decoding based on speech enhancement metadata

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

152 Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Methods and apparatus for decoding based on speech enhancement metadata

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

152 Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links