Coherence-based audio coding and synthesis
DCFirst Claim
Patent Images
1. A method for processing two or more input audio signals, comprising the steps of:
- (a) converting M input audio signals from a time domain into a frequency domain, where M>
1;
(b) generating a set of one or more auditory scene parameters for each of one or more different frequency bands in the M converted input audio signals, where each set of one or more auditory scene parameters comprises information corresponding to an estimate of coherence between the M input audio signals, wherein the estimate of coherence is related to perceived width of an audio source corresponding to the M input audio signals;
(c) combining the M input audio signals to generate N combined audio signals, where M>
N; and
(d) transmitting the information corresponding to the estimate of coherence along with the N combined audio signals.
9 Assignments
Litigations
1 Petition
Accused Products
Abstract
An auditory scene is synthesized from a mono audio signal by modifying, for each critical band, an auditory scene parameter (e.g., an inter-aural level difference (ILD) and/or an inter-aural time difference (ITD)) for each sub-band within the critical band, where the modification is based on an average estimated coherence for the critical band. The coherence-based modification produces auditory scenes having objects whose widths more accurately match the widths of the objects in the original input auditory scene.
120 Citations
39 Claims
-
1. A method for processing two or more input audio signals, comprising the steps of:
-
(a) converting M input audio signals from a time domain into a frequency domain, where M>
1;(b) generating a set of one or more auditory scene parameters for each of one or more different frequency bands in the M converted input audio signals, where each set of one or more auditory scene parameters comprises information corresponding to an estimate of coherence between the M input audio signals, wherein the estimate of coherence is related to perceived width of an audio source corresponding to the M input audio signals; (c) combining the M input audio signals to generate N combined audio signals, where M>
N; and(d) transmitting the information corresponding to the estimate of coherence along with the N combined audio signals. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. An apparatus for processing two or more input audio signals, comprising:
-
(a) an audio analyzer comprising; (1) one or more time-frequency transformers configured to convert M input audio signals from a time domain into a frequency domain, where M>
1; and(2) a coherence estimator configured to generate a set of one or more auditory scene parameters for each of one or more different frequency bands in the M converted input audio signals, where each set of one or more auditory scene parameters comprises information corresponding to an estimate of coherence between the M input audio signals, wherein the estimate of coherence is related to perceived width of an audio source corresponding to the M input audio signals; and (b) a combiner configured to combine the M input audio signals to generate N combined audio signals, where M>
N, and transmit the information corresponding to the estimate of coherence along with the N combined audio signals. - View Dependent Claims (8)
-
-
9. An encoded audio bitstream generated by:
-
(a) converting M input audio signals from a time domain into a frequency domain, where M>
1;(b) generating a set of one or more auditory scene parameters for each of one or more different frequency bands in the M converted input audio signals, where each set of one or more auditory scene parameters comprises information corresponding to an estimate of coherence between the M input audio signals, wherein the estimate of coherence is related to perceived width of an audio source corresponding to the M input audio signals; (c) combining the M input audio signals to generate N combined audio signals of the encoded audio bitstream, where M>
N; and(d) encoding the information corresponding to the estimate of coherence into the encoded audio bitstream.
-
-
10. A method for synthesizing an auditory scene, comprising the steps of:
-
(a) dividing an input audio signal into one or more frequency bands, wherein each band comprises a plurality of sub-bands; and (b) applying an auditory scene parameter to each band to generate two or more output audio signals, wherein the auditory scene parameter is modified for each different sub-band in the band based on a coherence value, wherein the coherence value is related to perceived width of a synthesized audio source corresponding to the two or more output audio signals. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. An apparatus for synthesizing an auditory scene, comprising:
-
(1) a time-frequency transformer configured to convert an input audio signal from a time domain into one or more frequency bands in a frequency domain, wherein each band comprises a plurality of sub-bands; (2) an auditory scene synthesizer configured to apply an auditory scene parameter to each band to generate two or more output audio signals, wherein the auditory scene parameter is modified for each different sub-band in the band based on a coherence value, wherein the coherence value is related to perceived width of a synthesized audio source corresponding to the two or more output audio signals; and (3) one or more inverse time-frequency transformers configured to convert the two or more output audio signals from the frequency domain into the time domain. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35)
-
-
36. A method for processing two or more input audio signals, comprising the steps of:
-
(a) converting M input audio signals from a time domain into a frequency domain, where M>
1;(b) generating a set of one or more auditory scene parameters for each of one or more different frequency bands in the M converted input audio signals, where each set of one or more auditory scene parameters comprises an estimate of coherence between the M input audio signals, wherein the estimate of coherence is related to perceived width of an audio source corresponding to the M input audio signals; and (c) combining the M input audio signals to generate N combined audio signals, where M>
N, wherein step (b) comprises the steps of;(1) generating an estimated coherence between at least two input audio signals for one or more sub-bands; and (2) generating an average estimated coherence for one or more critical bands, wherein each critical band comprises one or more sub-bands. - View Dependent Claims (37, 38)
-
-
39. A method for processing two or more input audio signals, comprising the steps of:
-
(a) converting M input audio signals from a time domain into a frequency domain, where M>
1;(b) generating a set of one or more auditory scene parameters for each of one or more different frequency bands in the M converted input audio signals, where each set of one or more auditory scene parameters comprises an estimate of coherence between the M input audio signals, wherein the estimate of coherence is related to perceived width of an audio source corresponding to the M input audio signals; and (c) combining the M input audio signals to generate N combined audio signals, where M>
N, wherein the auditory scene parameters further comprise one or more of an inter-aural level difference (ILD), an inter-aural time difference (ITD), and a head-related transfer function (HRTF).
-
Specification