Hybrid waveform-coded and parametric-coded speech enhancement
First Claim
1. A method, comprising:
- receiving mixed audio content, in a reference audio channel representation, that are distributed over a plurality of audio channels of the reference audio channel representation, the mixed audio content having a mix of speech content and non-speech audio content;
transforming one or more portions of the mixed audio content that are distributed over two or more non-Mid/Side (non-M/S) channels in the plurality of audio channels of the reference audio channel representation into one or more portions of the transformed mixed audio content in an M/S audio channel representation that are distributed over one or more channels of the M/S audio channel representation, wherein the M/S audio channel representation comprises at least a mid-channel signal and a side-channel signal, wherein the mid-channel signal represents a weighted or non-weighted sum of two channels of the reference audio channel representation, and wherein the side-channel signal represents a weighted or non-weighted difference of two channels of the reference audio channel representation;
determining metadata for speech enhancement of the one or more portions of the transformed mixed audio content in the M/S audio channel representation, wherein a first type of speech enhancement is waveform-encoded speech enhancement of a reduced quality version of the mid-channel signal in the M/S audio channel representation, and a second type of speech enhancement is parametric-encoded speech enhancement of a reconstructed version of the mid-channel signal in the M/S audio channel representation, the metadata including a mid-channel prediction parameter to reconstruct the mid-channel signal, a first gain parameter for waveform-encoded speech enhancement of the mid-channel signal, and a second gain parameter for parametric-encoded speech enhancement of the reconstructed mid-channel signal; and
generating an audio signal that comprises the mixed audio content and the metadata for speech enhancement of the one or more portions of the transformed mixed audio content in the M/S audio channel representation;
wherein the method is performed by one or more computing devices.
2 Assignments
0 Petitions
Accused Products
Abstract
A method for hybrid speech enhancement which employs parametric-coded enhancement (or blend of parametric-coded and waveform-coded enhancement) under some signal conditions and waveform-coded enhancement (or a different blend of parametric-coded and waveform-coded enhancement) under other signal conditions. Other aspects are methods for generating a bitstream indicative of an audio program including speech and other content, such that hybrid speech enhancement can be performed on the program, a decoder including a buffer which stores at least one segment of an encoded audio bitstream generated by any embodiment of the inventive method, and a system or device (e.g., an encoder or decoder) configured (e.g., programmed) to perform any embodiment of the inventive method. At least some of speech enhancement operations are performed by a recipient audio decoder with Mid/Side speech enhancement metadata generated by an upstream audio encoder.
-
Citations
12 Claims
-
1. A method, comprising:
-
receiving mixed audio content, in a reference audio channel representation, that are distributed over a plurality of audio channels of the reference audio channel representation, the mixed audio content having a mix of speech content and non-speech audio content; transforming one or more portions of the mixed audio content that are distributed over two or more non-Mid/Side (non-M/S) channels in the plurality of audio channels of the reference audio channel representation into one or more portions of the transformed mixed audio content in an M/S audio channel representation that are distributed over one or more channels of the M/S audio channel representation, wherein the M/S audio channel representation comprises at least a mid-channel signal and a side-channel signal, wherein the mid-channel signal represents a weighted or non-weighted sum of two channels of the reference audio channel representation, and wherein the side-channel signal represents a weighted or non-weighted difference of two channels of the reference audio channel representation; determining metadata for speech enhancement of the one or more portions of the transformed mixed audio content in the M/S audio channel representation, wherein a first type of speech enhancement is waveform-encoded speech enhancement of a reduced quality version of the mid-channel signal in the M/S audio channel representation, and a second type of speech enhancement is parametric-encoded speech enhancement of a reconstructed version of the mid-channel signal in the M/S audio channel representation, the metadata including a mid-channel prediction parameter to reconstruct the mid-channel signal, a first gain parameter for waveform-encoded speech enhancement of the mid-channel signal, and a second gain parameter for parametric-encoded speech enhancement of the reconstructed mid-channel signal; and generating an audio signal that comprises the mixed audio content and the metadata for speech enhancement of the one or more portions of the transformed mixed audio content in the M/S audio channel representation; wherein the method is performed by one or more computing devices. - View Dependent Claims (2, 3, 4, 5, 6, 9, 10)
-
-
7. A method, comprising:
-
receiving an audio signal that comprises mixed audio content in a reference audio channel representation and metadata for speech enhancement, the mixed audio content having a mix of speech content and non-speech audio content; transforming one or more portions of the mixed audio content that spread over two or more non-M/S channels in a plurality of audio channels of the reference audio channel representation into one or more portions of transformed mixed audio content in an M/S audio channel representation that spread over one or more M/S channels of the M/S audio channel representation, wherein the M/S audio channel representation comprises at least a mid-channel signal and a side-channel signal, wherein the mid-channel signal represents a weighted or non-weighted sum of two channels of the reference audio channel representation, and wherein the side-channel signal represents a weighted or non-weighted difference of two channels of the reference audio channel representation; determining metadata for speech enhancement of the one or more portions of the transformed mixed audio content in the M/S audio channel representation, wherein a first type of speech enhancement is waveform-encoded speech enhancement of a reduced quality version of the mid-channel signal in the M/S audio channel representation, and a second type of speech enhancement is parametric-encoded speech enhancement of a reconstructed version of the mid-channel signal in the M/S audio channel representation, the metadata including a mid-channel prediction parameter to reconstruct the mid-channel signal, a first gain parameter for waveform-encoded speech enhancement of the mid-channel signal, and a second gain parameter for parametric-encoded speech enhancement of the reconstructed mid-channel signal; performing one or more speech enhancement operations, based on the metadata for speech enhancement, on the one or more portions of the transformed mixed audio content in the M/S audio channel representation to generate one or more portions of enhanced speech content in the M/S representation; combining the one or more portions of the transformed mixed audio content in the M/S audio channel representation with the one or more portions of the enhanced speech content in the M/S representation to generate one or more portions of speech enhanced mixed audio content in the M/S representation; wherein the method is performed by one or more computing devices. - View Dependent Claims (8, 11, 12)
-
Specification