Coherence-based audio coding and synthesis

US 7,006,636 B2
Filed: 05/24/2002
Issued: 02/28/2006
Est. Priority Date: 05/24/2002
Status: Expired due to Term

- Alert
- Pin

Associated Cases

Associated Defendants

First Claim

Patent Images

1. A method for processing two or more input audio signals, comprising the steps of:

(a) converting M input audio signals from a time domain into a frequency domain, where M>

1;

(b) generating a set of one or more auditory scene parameters for each of one or more different frequency bands in the M converted input audio signals, where each set of one or more auditory scene parameters comprises information corresponding to an estimate of coherence between the M input audio signals, wherein the estimate of coherence is related to perceived width of an audio source corresponding to the M input audio signals;

(c) combining the M input audio signals to generate N combined audio signals, where M>

N; and

(d) transmitting the information corresponding to the estimate of coherence along with the N combined audio signals.

View all claims

9 Assignments

Timeline View

Assignment View

Litigations

1 Petition

Accused Products

Abstract

An auditory scene is synthesized from a mono audio signal by modifying, for each critical band, an auditory scene parameter (e.g., an inter-aural level difference (ILD) and/or an inter-aural time difference (ITD)) for each sub-band within the critical band, where the modification is based on an average estimated coherence for the critical band. The coherence-based modification produces auditory scenes having objects whose widths more accurately match the widths of the objects in the original input auditory scene.

120 Citations

39 Claims

1. A method for processing two or more input audio signals, comprising the steps of:
- (a) converting M input audio signals from a time domain into a frequency domain, where M>
  
  1;
  
  (b) generating a set of one or more auditory scene parameters for each of one or more different frequency bands in the M converted input audio signals, where each set of one or more auditory scene parameters comprises information corresponding to an estimate of coherence between the M input audio signals, wherein the estimate of coherence is related to perceived width of an audio source corresponding to the M input audio signals;
  
  (c) combining the M input audio signals to generate N combined audio signals, where M>
  
  N; and
  
  (d) transmitting the information corresponding to the estimate of coherence along with the N combined audio signals.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The invention of claim 1, wherein:
    - step (a) comprises the step of applying a discrete Fourier transform (DFT) to convert left and right audio signals of an input audio signal from the time domain into a plurality of sub-bands in the frequency domain;
      
      step (b) comprises the steps of;
      
      (1) generating an estimated coherence between the left and right audio signals for each sub-band; and
      
      (2) generating an average estimated coherence for one or more critical bands, wherein each critical band comprises a plurality of sub-bands; and
      
      step (c) comprises the steps of;
      
      (1) combining the left and right audio signals into a single mono signal; and
      
      (2) encoding the single mono signal to generate an encoded mono signal bitstream.
  - 3. The invention of claim 2, wherein the average estimated coherence for each critical band is encoded into the encoded mono signal bitstream.
  - 4. The invention of claim 1, wherein the auditory scene parameters further comprise one or more of an inter-aural level difference (ILD), an inter-aural time difference (ITD), and a head-related transfer function (HRTF).
  - 5. The invention of claim 1, wherein the estimate of coherence is a function of power estimates for the M input audio signals.
  - 6. The invention of claim 1, wherein the auditory scene parameters are transmitted along with the N combined audio signals to an apparatus adapted to synthesize an auditory scene from the N combined audio signals and the auditory scene parameters.

7. An apparatus for processing two or more input audio signals, comprising:
- (a) an audio analyzer comprising;
  
  (1) one or more time-frequency transformers configured to convert M input audio signals from a time domain into a frequency domain, where M>
  
  1; and
  
  (2) a coherence estimator configured to generate a set of one or more auditory scene parameters for each of one or more different frequency bands in the M converted input audio signals, where each set of one or more auditory scene parameters comprises information corresponding to an estimate of coherence between the M input audio signals, wherein the estimate of coherence is related to perceived width of an audio source corresponding to the M input audio signals; and
  
  (b) a combiner configured to combine the M input audio signals to generate N combined audio signals, where M>
  
  N, and transmit the information corresponding to the estimate of coherence along with the N combined audio signals.
- View Dependent Claims (8)
- - 8. The invention of claim 7, wherein the apparatus is adapted to transmit the auditory scene parameters along with the N combined audio signals to an apparatus adapted to synthesize an auditory scene from the N combined audio signals and the auditory scene parameters.

9. An encoded audio bitstream generated by:
- (a) converting M input audio signals from a time domain into a frequency domain, where M>
  
  1;
  
  (b) generating a set of one or more auditory scene parameters for each of one or more different frequency bands in the M converted input audio signals, where each set of one or more auditory scene parameters comprises information corresponding to an estimate of coherence between the M input audio signals, wherein the estimate of coherence is related to perceived width of an audio source corresponding to the M input audio signals;
  
  (c) combining the M input audio signals to generate N combined audio signals of the encoded audio bitstream, where M>
  
  N; and
  
  (d) encoding the information corresponding to the estimate of coherence into the encoded audio bitstream.

10. A method for synthesizing an auditory scene, comprising the steps of:
- (a) dividing an input audio signal into one or more frequency bands, wherein each band comprises a plurality of sub-bands; and
  
  (b) applying an auditory scene parameter to each band to generate two or more output audio signals, wherein the auditory scene parameter is modified for each different sub-band in the band based on a coherence value, wherein the coherence value is related to perceived width of a synthesized audio source corresponding to the two or more output audio signals.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 11. The invention of claim 10, wherein the auditory scene parameter is a level difference.
  - 12. The invention of claim 11, wherein, for each sub-band in each band, the level difference corresponds to left and right weighting factors w_Land w_Rthat are modified by factors n_Land n_R, respectively, to generate left and right modified weighting factors w_L′
    - and w_R′
      
      that are used to generate left and right audio signals of an output audio signal, wherein;
      
      $w_{L}^{'} = w_{L} n_{L}; w_{R}^{'} = w_{R} n_{R}$ $\frac{n_{L}}{n_{R}} = 10^{\frac{g r_{d B}}{20}}$ (w_Ln_L)²+(w_Rn_R)²=1where g is a gain value for the corresponding band and r_dBis a modification function value for the corresponding sub-band.
  - 13. The invention of claim 12, wherein, for each band:
    - the modification function is a zero-mean random sequence within the band;
      
      the coherence value is an average estimated coherence for the band; and
      
      the gain g is a function of the average estimated coherence.
  - 14. The invention of claim 10, wherein the auditory scene parameter is a time difference.
  - 15. The invention of claim 14, wherein, for each sub-band s in each band c, a time difference τ
    - _sis modified based on a delay offset d_sand a gain factor g_cto generate a modified time difference τ
      
      _s′
      
      that is applied to generate left and right audio signals of an output audio signal, wherein;
      
      τ
      
      _s′
      
      =g_cd_s+τ
      
      _s.
  - 16. The invention of claim 15, wherein, for each band:
    - the delay offset d_sis based on a zero-mean random sequence within the band;
      
      the coherence value is an average estimated coherence for the band; and
      
      the gain g_cis a function of the average estimated coherence.
  - 17. The invention of claim 10, wherein the coherence value is estimated from left and right audio signals of an audio signal used to generate the input audio signal.
  - 18. The invention of claim 17, wherein the estimate of coherence is a function of power estimates for the left and right audio signals.
  - 19. The invention of claim 10, wherein, within each band, the auditory scene parameter is modified based on a random sequence.
  - 20. The invention of claim 10, wherein, within each band, the auditory scene parameter is modified based on a sinusoidal function.
  - 21. The invention of claim 10, wherein, within each band, the auditory scene parameter is modified based on a triangular function.
  - 22. The invention of claim 10, wherein:
    - step (a) comprises the steps of;
      
      (1) decoding an encoded audio bitstream to recover a mono audio signal; and
      
      (2) applying a time-frequency transform to convert the mono audio signal from a time domain into the plurality of sub-bands in a frequency domain;
      
      step (b) comprises the steps of;
      
      (1) applying the auditory scene parameter to each band to generate left and right audio signals of an output audio signal in the frequency domain; and
      
      (2) applying an inverse time-frequency transform to convert the left and right audio signals from the frequency domain into the time domain.

23. An apparatus for synthesizing an auditory scene, comprising:
- (1) a time-frequency transformer configured to convert an input audio signal from a time domain into one or more frequency bands in a frequency domain, wherein each band comprises a plurality of sub-bands;
  
  (2) an auditory scene synthesizer configured to apply an auditory scene parameter to each band to generate two or more output audio signals, wherein the auditory scene parameter is modified for each different sub-band in the band based on a coherence value, wherein the coherence value is related to perceived width of a synthesized audio source corresponding to the two or more output audio signals; and
  
  (3) one or more inverse time-frequency transformers configured to convert the two or more output audio signals from the frequency domain into the time domain.
- View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35)
- - 24. The invention of claim 23, wherein the auditory scene parameter is a level difference.
  - 25. The invention of claim 24, wherein, for each sub-band in each band, the level difference corresponds to left and right weighting factors w_Land w_Rthat are modified by factors n_Land n_R, respectively, to generate left and right modified weighting factors w_L′
    - and w_R′
      
      that are used to generate left and right audio signals of an output audio signal, wherein;
      
      $w_{L}^{'} = w_{L} n_{L}; w_{R}^{'} = w_{R} n_{R}$ $\frac{n_{L}}{n_{R}} = 10^{{gr}_{dB} / 20}$ (w_Ln_L)²+(w_Rn_R)²=1where g is a gain value for the corresponding band and r_dBis a modification function value for the corresponding sub-band.
  - 26. The invention of claim 25, wherein, for each band:
    - the modification function is a zero-mean random sequence within the band;
      
      the coherence value is an average estimated coherence for the band; and
      
      the gain g is a function of the average estimated coherence.
  - 27. The invention of claim 23, wherein the auditory scene parameter is a time difference.
  - 28. The invention of claim 27, wherein, for each sub-band s in each band c, a time difference τ
    - _sis modified based on a delay offset d_sand a gain factor g_cto generate a modified time difference τ
      
      _s′
      
      that is applied to generate left and right audio signals of an output audio signal, wherein;
      
      τ
      
      _s′
      
      =g_cd_s+τ
      
      _s.
  - 29. The invention of claim 28, wherein, for each band:
    - the delay offset d_sis based on a zero-mean random sequence within the band;
      
      the coherence value is an average estimated coherence for the band; and
      
      the gain g_cis a function of the average estimated coherence.
  - 30. The invention of claim 23, wherein the coherence value is estimated from left and right audio signals of an audio signal used to generate the input audio signal.
  - 31. The invention of claim 30, wherein the estimate of coherence is a function of power estimates for the left and right audio signals.
  - 32. The invention of claim 23, wherein, within each band, the auditory scene parameter is modified based on a random sequence.
  - 33. The invention of claim 23, wherein, within each band, the auditory scene parameter is modified based on a sinusoidal function.
  - 34. The invention of claim 23, wherein, within each band, the auditory scene parameter is modified based on a triangular function.
  - 35. The invention of claim 23, wherein:
    - step (a) comprises the steps of;
      
      (1) decoding an encoded audio bitstream to recover a mono audio signal; and
      
      (2) applying a time-frequency transform to convert the mono audio signal from a time domain into the plurality of sub-bands in a frequency domain;
      
      step (b) comprises the steps of;
      
      (1) applying the auditory scene parameter to each band to generate left and right audio signals of an output audio signal in the frequency domain; and
      
      (2) applying an inverse time-frequency transform to convert the left and right audio signals from the frequency domain into the time domain.

36. A method for processing two or more input audio signals, comprising the steps of:
- (a) converting M input audio signals from a time domain into a frequency domain, where M>
  
  1;
  
  (b) generating a set of one or more auditory scene parameters for each of one or more different frequency bands in the M converted input audio signals, where each set of one or more auditory scene parameters comprises an estimate of coherence between the M input audio signals, wherein the estimate of coherence is related to perceived width of an audio source corresponding to the M input audio signals; and
  
  (c) combining the M input audio signals to generate N combined audio signals, where M>
  
  N, wherein step (b) comprises the steps of;
  
  (1) generating an estimated coherence between at least two input audio signals for one or more sub-bands; and
  
  (2) generating an average estimated coherence for one or more critical bands, wherein each critical band comprises one or more sub-bands.
- View Dependent Claims (37, 38)
- - 37. The invention of claim 36, wherein:
    - step (a) comprises the step of applying a discrete Fourier transform (DFT) to convert the input audio signals from the time domain into a plurality of sub-bands in the frequency domain;
      
      step (c) comprises the steps of;
      
      (1) combining the input audio signals into at least one combined signal; and
      
      (2) encoding the combined signal to generate an encoded signal bitstream.
  - 38. The invention of claim 36, wherein the average estimated coherence for each critical band is encoded with the N combined audio signals into an encoded signal bitstream.

39. A method for processing two or more input audio signals, comprising the steps of:
- (a) converting M input audio signals from a time domain into a frequency domain, where M>
  
  1;
  
  (b) generating a set of one or more auditory scene parameters for each of one or more different frequency bands in the M converted input audio signals, where each set of one or more auditory scene parameters comprises an estimate of coherence between the M input audio signals, wherein the estimate of coherence is related to perceived width of an audio source corresponding to the M input audio signals; and
  
  (c) combining the M input audio signals to generate N combined audio signals, where M>
  
  N, wherein the auditory scene parameters further comprise one or more of an inter-aural level difference (ILD), an inter-aural time difference (ITD), and a head-related transfer function (HRTF).

Specification

Resources

Litigation Campaign Assessment

Litigation Data

Current Assignee
Avago Technologies International Sales Pte Limited (Broadcom, Inc.)
Original Assignee
Agere Systems Incorporated (Broadcom, Inc.)
Inventors
Faller, Christof, Baumgarte, Frank
Primary Examiner(s)
Tran, Sinh
Assistant Examiner(s)
GRAHAM, ANDREW R

Application Number

US10/155,437
Publication Number

US 20030219130A1
Time in Patent Office

1,376 Days
Field of Search

381/1, 381/17, 381/19, 381/98, 381/103, 381/18, 381/10, 700/94, 704/200.1, 704/263, 704/500, 704/501, 704/205, 704/224
US Class Current

381/17
CPC Class Codes

G10L 19/008   Multichannel audio signal c...

G10L 19/0204   using subband decomposition

H04S 2420/03   Application of parametric c...

H04S 3/002   Non-adaptive circuits, e.g....

H04S 3/004   For headphones

H04S 5/00   Pseudo-stereo systems, e.g....

Coherence-based audio coding and synthesis

First Claim

9 Assignments

Litigations

1 Petition

Accused Products

Abstract

120 Citations

39 Claims

Specification

Solutions

Use Cases

Quick Links

Coherence-based audio coding and synthesis

First Claim

9 Assignments

Subscription Required

Subscription Required

Litigations

1 Petition

Subscription Required

Accused Products

Subscription Required

Abstract

120 Citations

39 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links