Acoustic source location using a microphone array
First Claim
1. A method for locating a speaking participant in a video conference, comprising:
- receiving sound from said speaking participant at each of a plurality of microphones positioned in a predetermined 3-dimensional configuration;
from said received sound, computing three or more time delays using a cross-correlation function computed in a frequency domain, each time delay being representative of the difference in arrival times of said sound at a selected pair of said microphones;
based on the positions of said microphones and said time delays, determining one or more possible positions of said speaking participant; and
deriving from said possible positions of said speaking participant a final position, said deriving said final position comprising;
(a) computing for each of said possible positions a radial distance from a reference position;
(b) obtaining from said possible positions a group of positions by discarding positions corresponding to the p1 furthest and the p2 closest radial distances from said reference position; and
(c) computing a weighted average position from said group of positions.
8 Assignments
0 Petitions
Accused Products
Abstract
An apparatus and method in a video conference system provides accurate determination of the position of a speaking participant by measuring the difference in arrival times of a sound originating from the speaking participant, using as few as four microphones in a 3-dimensional configuration. In one embodiment, a set of simultaneous equations relating the position of the sound source and each microphone and relating to the distance of each microphone to each other are solved off-line and programmed into a host computer. In one embodiment, the set of simultaneous equations provide multiple solutions and the median of such solutions is picked as the final position. In another embodiment, an average of the multiple solutions are provided as the final position.
-
Citations
8 Claims
-
1. A method for locating a speaking participant in a video conference, comprising:
-
receiving sound from said speaking participant at each of a plurality of microphones positioned in a predetermined 3-dimensional configuration;
from said received sound, computing three or more time delays using a cross-correlation function computed in a frequency domain, each time delay being representative of the difference in arrival times of said sound at a selected pair of said microphones;
based on the positions of said microphones and said time delays, determining one or more possible positions of said speaking participant; and
deriving from said possible positions of said speaking participant a final position, said deriving said final position comprising;
(a) computing for each of said possible positions a radial distance from a reference position;
(b) obtaining from said possible positions a group of positions by discarding positions corresponding to the p1 furthest and the p2 closest radial distances from said reference position; and
(c) computing a weighted average position from said group of positions.
-
-
2. A video conference system, comprising:
-
a plurality of microphones and a camera positioned in a predetermined configuration, each microphone providing an audio signal representative of sound received at said microphone;
a time delay module receiving said audio signals of said microphones and based on said audio signals, providing for each pair of said microphones a time delay estimate associated with said pair of microphones using a cross correlation function in a frequency domain;
a position determination module, based on said time delay estimates and said predetermined configuration, providing a plurality of possible positions of a sound source, and selecting from said possible positions a final position of said sound source, said position computation module (1) computing, for each of said possible positions, a radial distance from a reference position;
(2) obtaining from said possible positions a group of positions by discarding positions corresponding to the p1 farthest and the P2 closest radial distances from said reference position; and
(3) computing a weighted average position from said group of positions; and
a camera control module directing said camera towards said sound source using said final position of said sound source.
-
-
3. A video conference system, comprising:
-
a plurality of microphones and a camera positioned in a predetermined configuration, each microphone providing an audio signal representative of sound received at said microphone;
a time delay module receiving said audio signals of said microphones and based on said audio signals, providing for each pair of said microphones a time delay estimate associated with said pair of microphones using a cross correlation function in a frequency domain;
a position determination module, based on said time delay estimates and said predetermined configuration, providing a plurality of possible positions of a sound source, and selecting from said possible positions a final position of said sound source, said position determination module (a) solving a set of simultaneous equations relating the location of said sound source to the positions of said microphones, and relating positions of said microphones to each other; and
(b) applying said computed time delays to said solutions; and
a camera control module directing said camera towards said sound source using said final position of said sound source, wherein when said final position corresponds to a position outside a predetermined boundary, said camera control module directs said camera to an adjusted position within said boundary. - View Dependent Claims (4)
-
-
5. A video conference system, comprising:
-
a plurality of microphones and a camera positioned in a predetermined configuration, each microphone providing an audio signal representative of sound received at said microphone;
a time delay module receiving said audio signals of said microphones and based on said audio signals, providing for each pair of said microphones a time delay estimate associated with said pair of microphones using a cross correlation function in a frequency domain;
a position determination module, based on said time delay estimates and said predetermined configuration, providing a plurality of possible positions of a sound source, and selecting from said possible positions a final position of said sound source, said position determination module (a) solving a set of simultaneous equations relating the location of said sound source to the positions of said microphones, and relating positions of said microphones to each other; and
(b) applying said computed time delays to said solutions; and
a camera control module directing said camera towards said sound source using said final position of said sound source, wherein said camera control module divides the view from said camera into 3-dimensional zones, each zone being specified by a radial distance from said camera and angular positions from said camera in two orthogonal directions. - View Dependent Claims (6, 7, 8)
-
Specification