Method and apparatus for improved matching of auditory space to visual space in video teleconferencing applications using window-based displays

US 8,571,192 B2
Filed: 06/30/2009
Issued: 10/29/2013
Est. Priority Date: 06/30/2009
Status: Active Grant

First Claim

Patent Images

1. A method for generating a spatial rendering of a real-time audio sound from a conference participant to a remote real-time video teleconference participant in real-time using a plurality of speakers, the audio sound related a real time to video being displayed to said remote video teleconference participant on a window-based video display screen having a given physical location of said conference participant, the method comprising:

receiving one or more real-time video input signals of said conference participant for use in displaying said real-time video to said remote video teleconference participant on said window-based video display screen, each of said received video input signals being displayed in a corresponding window on said video display screen;

receiving one or more real-time audio input signals related to said one or more video input signals, one of said audio input signals including said audio sound;

determining a desired physical location of said conference participant for spatially rendering said audio sound relative to said video display screen, the desired physical location being determined based on a position on the video display screen at which a particular one of said windows is being displayed, the particular one of said windows corresponding to the received video input signal related to the received audio input signal which includes said audio sound; and

generating a plurality of real-time audio output signals based on said determined desired physical location for spatially rendering said audio sound, said plurality of audio signals being generated such that when delivered to said remote video teleconference participant using said plurality of speakers, the remote video teleconference participant hears said audio sound as being rendered from said determined desired physical location for spatially rendering said audio sound.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for enabling an improved experience by better matching of the auditory space to the visual space in video viewing applications such as those that may be used in video teleconferencing systems using window-based displays. In particular, in accordance with certain illustrative embodiments of the present invention, one or more desired sound source locations are determined based on a location of a window in a video teleconference display device (which may, for example, comprise the image of a teleconference participant within the given window), and a plurality of audio signals which accurately locate the sound sources at the desired sound source locations (based on the location of the given window in the display) are advantageously generated.

Citations

24 Claims

1. A method for generating a spatial rendering of a real-time audio sound from a conference participant to a remote real-time video teleconference participant in real-time using a plurality of speakers, the audio sound related a real time to video being displayed to said remote video teleconference participant on a window-based video display screen having a given physical location of said conference participant, the method comprising:
- receiving one or more real-time video input signals of said conference participant for use in displaying said real-time video to said remote video teleconference participant on said window-based video display screen, each of said received video input signals being displayed in a corresponding window on said video display screen;
  
  receiving one or more real-time audio input signals related to said one or more video input signals, one of said audio input signals including said audio sound;
  
  determining a desired physical location of said conference participant for spatially rendering said audio sound relative to said video display screen, the desired physical location being determined based on a position on the video display screen at which a particular one of said windows is being displayed, the particular one of said windows corresponding to the received video input signal related to the received audio input signal which includes said audio sound; and
  
  generating a plurality of real-time audio output signals based on said determined desired physical location for spatially rendering said audio sound, said plurality of audio signals being generated such that when delivered to said remote video teleconference participant using said plurality of speakers, the remote video teleconference participant hears said audio sound as being rendered from said determined desired physical location for spatially rendering said audio sound.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 16)
- - 2. The method of claim 1 wherein said remote video teleconference participant and the video display screen are located in a remote room, and wherein the one or more video input signals and the one or more audio input signals are received in said remote room from one or more separate rooms each having one or more other video teleconference participants located therein.
  - 3. The method of claim 2 wherein the plurality of audio output signals is generated further based on a position on the video screen within said particular one of said windows where a given other one of said video teleconference participants is being displayed.
  - 4. The method of claim 2 further comprising enlarging a display size of the particular one of said windows corresponding to the received video input signal related to the received audio input signal which includes said audio sound, when it is determined that said audio sound comprises a given other one of said video teleconference participants who is currently speaking.
  - 5. The method of claim 1 further comprising:
    - relocating, to a new position on the video display screen, the particular one of said windows corresponding to the received video input signal related to the received audio input signal which includes said audio sound;
      
      determining a new desired physical location relative to said video display screen for spatially rendering said audio sound, the new desired physical location being determined based on the new position on the video display screen at which the particular one of said windows has been relocated; and
      
      generating a new plurality of audio output signals based on said determined new desired physical location for spatially rendering said audio sound, said new plurality of audio signals being generated such that when delivered to said remote video teleconference participant using said plurality of speakers, the remote video teleconference participant hears said audio sound as being rendered from said determined new desired physical location for spatially rendering said audio sound.
  - 6. The method of claim 1 wherein the desired physical location relative to said video display screen for spatially rendering said audio sound is further determined based on a display size of the particular one of said windows corresponding to the received video input signal related to the received audio input signal which includes said audio sound, the method further comprising:
    - resizing, to a new display size, the particular one of said windows corresponding to the received video input signal related to the received audio input signal which includes said audio sound;
      
      determining a new desired physical location relative to said video display screen for spatially rendering said audio sound, the new desired physical location being determined based on the new display size of the particular one of said windows; and
      
      generating a new plurality of audio output signals based on said determined new desired physical location for spatially rendering said audio sound, said new plurality of audio signals being generated such that when delivered to said remote video teleconference participant using said plurality of speakers, the remote video teleconference participant hears said audio sound as being rendered from said determined new desired physical location for spatially rendering said audio sound.
  - 7. The method of claim 1 wherein the plurality of audio output signals is generated further based on a display size of the particular one of said windows corresponding to the received video input signal related to the received audio input signal which includes said audio sound, such that a volume level thereof is adjusted based on said display size of the particular one of said windows.
  - 8. The method of claim 1 wherein the desired physical location relative to said video display screen for spatially rendering said audio sound is determined further based on a display size of the particular one of said windows corresponding to the received video input signal related to the received audio input signal which includes said audio sound, such that an apparent depth of said desired physical location relative to said video display screen for spatially rendering said audio sound is adjusted based on said display size of the particular one of said windows.
  - 9. The method of claim 1 further comprising determining a current physical location of the remote video teleconference participant relative to said video display screen, and wherein the plurality of audio output signals is generated further based on said determined current physical location of the remote video teleconference participant relative to said video display screen.
  - 10. The method of claim 1 wherein the plurality of speakers comprises a headphone set worn by the video conference participant, wherein the headphone set comprises at least a left speaker for providing sound to a left ear of the video conference participant and a right speaker for providing sound to a right ear of the video conference participant, and wherein said generating the plurality of audio output signals comprises generating binaural audio signals comprising at least a left audio output signal which is used to drive the left speaker and a right audio output signal which is used to drive the right speaker.
  - 11. The method of claim 1 wherein the plurality of speakers comprises a plurality of loudspeakers placed in predetermined physical locations relative to the given physical location of the video display screen, wherein the plurality of loudspeakers includes at least a left loudspeaker whose predetermined physical location comprises a position left of the video display screen and a right loudspeaker whose predetermined physical location comprises a position right of the video display screen, and wherein said generating the plurality of audio output signals comprises generating binaural audio signals comprising at least a left audio output signal which is used to drive the left loudspeaker and a right audio output signal which is used to drive the right loudspeaker.
  - 12. The method of claim 11 wherein the left audio output signal has been adapted to reduce crosstalk from the right loudspeaker to a left ear of the video conference participant, and wherein the right audio output signal has been adapted to reduce crosstalk from the left loudspeaker to a right ear of the video conference participant.
  - 14. The apparatus of claim 1 wherein said apparatus comprises a portion of a video teleconferencing system located in a remote room, and wherein the one or more video input signals and the one or more audio input signals are received by said apparatus from one or more separate rooms each having one or more other video teleconference participants located therein.
  - 15. The apparatus of claim 14 wherein the audio output signal generator generates said plurality of audio output signals further based on a position on the video screen within said particular one of said windows where a given other one of said video teleconference participants is being displayed.
  - 16. The apparatus of claim 14 wherein said processor further directs said video display screen to enlarge a display size of the particular one of said windows corresponding to the received video input signal related to the received audio input signal which includes said audio sound, when it is determined that said audio sound comprises a given other one of said video teleconference participants who is currently speaking.

13. An apparatus for generating a spatial rendering of a real-time audio sound from a conference participant to a remote video teleconference participant in real-time, the apparatus comprising:
- a plurality of speakers;
  
  a real-time window-based video display screen having a given physical location of said conference participant, the window-based video display screen for displaying a real-time video to the video teleconference participant, the real-time audio sound being related to the video being displayed to said video teleconference participant;
  
  a video input signal receiver which receives one or more real-time video input signals of said conference participant for use in displaying said video to said remote video teleconference participant on said window-based video display screen, each of said received video input signals being displayed in a corresponding window on said video display screen;
  
  an audio input signal receiver which receives one or more real-time audio input signals related to said one or more video input signals, one of said received audio input signals including said audio sound;
  
  a processor which determines a desired physical location of said conference participant for spatially rendering said audio sound relative to said video display screen, the desired physical location being determined based on a position on the video display screen at which a particular one of said windows is being displayed, the particular one of said windows corresponding to the received video input signal related to the received audio input signal which includes said audio sound; and
  
  an audio output signal generator which generates a plurality of real-time audio output signals based on said determined desired physical location for spatially rendering said audio sound, said plurality of audio signals being generated such that when delivered to said remote video teleconference participant using said plurality of speakers, the remote video teleconference participant hears said audio sound as being rendered from said determined desired physical location for spatially rendering said audio sound.
- View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24)
- - 17. The apparatus of claim 13 wherein the processorfurther directs said video display screen to relocate, to a new position on the video display screen, the particular one of said windows corresponding to the received video input signal related to the received audio input signal which includes said audio sound, andfurther determines a new desired physical location relative to said video display screen for spatially rendering said audio sound, the new desired physical location being determined based on the new position on the video display screen at which the particular one of said windows has been relocated, and alsowherein said audio output signal generator further generates a new plurality of audio output signals based on said determined new desired physical location for spatially rendering said audio sound, said new plurality of audio signals being generated such that when delivered to said remote video teleconference participant using said plurality of speakers, the remote video teleconference participant hears said audio sound as being rendered from said determined new desired physical location for spatially rendering said audio sound.
  - 18. The apparatus of claim 13 wherein the processor determines said desired physical location relative to said video display screen for spatially rendering said audio sound further based on a display size of the particular one of said windows corresponding to the received video input signal related to the received audio input signal which includes said audio sound, and wherein the processorfurther directs said video display screen to resize, to a new display size, the particular one of said windows corresponding to the received video input signal related to the received audio input signal which includes said audio sound, andfurther determines a new desired physical location relative to said video display screen for spatially rendering said audio sound, the new desired physical location being determined based on the new display size of the particular one of said windows, and alsowherein said audio output signal generator further generates a new plurality of audio output signals based on said determined new desired physical location for spatially rendering said audio sound, said new plurality of audio signals being generated such that when delivered to said remote video teleconference participant using said plurality of speakers, the remote video teleconference participant hears said audio sound as being rendered from said determined new desired physical location for spatially rendering said audio sound.
  - 19. The apparatus of claim 13 wherein the audio output signal generator generates said plurality of audio output signals further based on a display size of the particular one of said windows corresponding to the received video input signal related to the received audio input signal which includes said audio sound, such that a volume level thereof is adjusted based on said display size of the particular one of said windows.
  - 20. The apparatus of claim 13 wherein the processor determines said desired physical location relative to said video display screen for spatially rendering said audio sound further based on a display size of the particular one of said windows corresponding to the received video input signal related to the received audio input signal which includes said audio sound, such that an apparent depth of said desired physical location relative to said video display screen for spatially rendering said audio sound is adjusted based on said display size of the particular one of said windows.
  - 21. The apparatus of claim 13 wherein the processor further determines a current physical location of the remote video teleconference participant relative to said video display screen, and wherein the audio output signal generator generates said plurality of audio output signals further based on said determined current physical location of the remote video teleconference participant relative to said video display screen.
  - 22. The apparatus of claim 13 wherein the plurality of speakers comprises a headphone set worn by the video conference participant, wherein the headphone set comprises at least a left speaker for providing sound to a left ear of the video conference participant and a right speaker for providing sound to a right ear of the video conference participant, and wherein said audio output signal generator generates the plurality of audio output signals by generating binaural audio signals comprising at least a left audio output signal which is used to drive the left speaker and a right audio output signal which is used to drive the right speaker.
  - 23. The apparatus of claim 13 wherein the plurality of speakers comprises a plurality of loudspeakers placed in predetermined physical locations relative to the given physical location of the video display screen, wherein the plurality of loudspeakers includes at least a left loudspeaker whose predetermined physical location comprises a position left of the video display screen and a right loudspeaker whose predetermined physical location comprises a position right of the video display screen, and wherein said audio output signal generator generates the plurality of audio output signals by generating binaural audio signals comprising at least a left audio output signal which is used to drive the left loudspeaker and a right audio output signal which is used to drive the right loudspeaker.
  - 24. The apparatus of claim 23 wherein the left audio output signal has been adapted to reduce crosstalk from the right loudspeaker to a left ear of the video conference participant, and wherein the right audio output signal has been adapted to reduce crosstalk from the left loudspeaker to a right ear of the video conference participant.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Alcatel-Lucent SA (Nokia Corporation)
Original Assignee
Alcatel-Lucent SA (Nokia Corporation)
Inventors
Etter, Walter
Primary Examiner(s)
Elahee, Md S

Application Number

US12/459,366
Publication Number

US 20100328423A1
Time in Patent Office

1,582 Days
Field of Search

379/406, 379/93.21, 379/158, 379/202.01, 379/205.01, 348/14.08, 348/14.16, 345/565, 381/80, 370/401, 370260-262, 370/271, 455/416
US Class Current

379/202.01
CPC Class Codes

H04N 7/142 Constructional details of t...

H04N 7/15 Conference systems

Method and apparatus for improved matching of auditory space to visual space in video teleconferencing applications using window-based displays

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for improved matching of auditory space to visual space in video teleconferencing applications using window-based displays

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links