×

Hybrid waveform-coded and parametric-coded speech enhancement

  • US 10,141,004 B2
  • Filed: 08/27/2014
  • Issued: 11/27/2018
  • Est. Priority Date: 08/28/2013
  • Status: Active Grant
First Claim
Patent Images

1. A method, comprising:

  • receiving mixed audio content, in a reference audio channel representation, that are distributed over a plurality of audio channels of the reference audio channel representation, the mixed audio content having a mix of speech content and non-speech audio content;

    transforming one or more portions of the mixed audio content that are distributed over two or more non-Mid/Side (non-M/S) channels in the plurality of audio channels of the reference audio channel representation into one or more portions of the transformed mixed audio content in an M/S audio channel representation that are distributed over one or more channels of the M/S audio channel representation, wherein the M/S audio channel representation comprises at least a mid-channel signal and a side-channel signal, wherein the mid-channel signal represents a weighted or non-weighted sum of two channels of the reference audio channel representation, and wherein the side-channel signal represents a weighted or non-weighted difference of two channels of the reference audio channel representation;

    determining metadata for speech enhancement of the one or more portions of the transformed mixed audio content in the M/S audio channel representation, wherein a first type of speech enhancement is waveform-encoded speech enhancement of a reduced quality version of the mid-channel signal in the M/S audio channel representation, and a second type of speech enhancement is parametric-encoded speech enhancement of a reconstructed version of the mid-channel signal in the M/S audio channel representation, the metadata including a mid-channel prediction parameter to reconstruct the mid-channel signal, a first gain parameter for waveform-encoded speech enhancement of the mid-channel signal, and a second gain parameter for parametric-encoded speech enhancement of the reconstructed mid-channel signal; and

    generating an audio signal that comprises the mixed audio content and the metadata for speech enhancement of the one or more portions of the transformed mixed audio content in the M/S audio channel representation;

    wherein the method is performed by one or more computing devices.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×