Audio-visual object localization and tracking system and method therefor
First Claim
1. An integrated audio-visual method for localizing and tracking at least one object, comprising the steps of:
- capturing and transmitting an image of a video scene using a camera at an instant of time;
identifying an object contained in said image having a preselected visual feature;
estimating a location of the object by determining an angular orientation relative to the image plane of the camera of an imaginary line extending from an optical center of the camera to a point on the image plane of the camera representing a portion of the object;
converting acoustic waves from an audio source into audio signals using at least two microphones at substantially said instant of time;
identifying the audio source by determining on the basis of the audio signals a locus of points representing an estimate of the location of the audio source; and
computing the location of a region of intersection between the imaginary line and the locus, the intersection region being an improved estimate of the location of the object.
11 Assignments
0 Petitions
Accused Products
Abstract
A method for integrated audio-visual localizing and tracking of at least one object. The method includes the steps of capturing and transmitting an image of a video scene using a camera at an instant of time, identifying an object contained in the image having a preselected visual feature, and estimating a location of the object by determining an angular orientation relative to the image plane of the camera of an imaginary line extending from an optical center of the camera to a point on the image plane of the camera representing a portion of the object. The method further includes the steps of converting acoustic waves from an audio source into audio signals using at least two microphones at substantially the same time, identifying the audio source by determining a locus of points representing an estimate of the location of the audio source on the basis of the audio signals. An improved estimate of the location of the object is computed by determining the location of a region of intersection between the imaginary line and the locus.
156 Citations
16 Claims
-
1. An integrated audio-visual method for localizing and tracking at least one object, comprising the steps of:
-
capturing and transmitting an image of a video scene using a camera at an instant of time; identifying an object contained in said image having a preselected visual feature; estimating a location of the object by determining an angular orientation relative to the image plane of the camera of an imaginary line extending from an optical center of the camera to a point on the image plane of the camera representing a portion of the object; converting acoustic waves from an audio source into audio signals using at least two microphones at substantially said instant of time; identifying the audio source by determining on the basis of the audio signals a locus of points representing an estimate of the location of the audio source; and computing the location of a region of intersection between the imaginary line and the locus, the intersection region being an improved estimate of the location of the object. - View Dependent Claims (2)
-
-
3. An apparatus for integrated audio-visual localizing and tracking of at least one object:
-
a camera for capturing an image of a video scene and generating video signals representing the image at an instant of time; visual object localizer means connected to said camera for receiving the video signals, identifying an object having a preselected visual cue contained in said image, and for estimating a location of the object by determining the angular orientation of an imaginary line extending from an optical center of the camera to a point on an image plane of the camera representing a portion of the object; at least two microphones operative to pick up and convert acoustic waves from an audio source into audio signals at substantially said instant of time; audio source localizer means connected to said at least two microphones for determining a locus of points representing an estimate of the location of the audio source generating the acoustical waves on the basis of the audio signals; and integrated localizer means operatively connected to said visual object localizer means and said audio source localizer means for receiving information relating to said imaginary line and said locus for computing an improved estimate of the location of the object by determining the location of a region of intersection between said imaginary line and said locus of points. - View Dependent Claims (4, 5)
-
-
6. An integrated audio-visual method for localizing and tracking multiple objects, comprising the steps of:
-
capturing and transmitting images of a video scene using a plurality of cameras at an instant of time; identifying objects having preselected visual features contained in said images; determining an angular orientation of each of a plurality of imaginary lines extending from an optical center of each of the plurality of cameras to each point on an image plane of the camera representing a portion of each of the identified objects; grouping points of intersection of said imaginary lines according to a distance measure; converting acoustic waves from a plurality of audio sources into audio signals using a plurality of microphones at substantially said instant of time; determining a plurality of loci of points representing estimates of locations of the audio sources on the basis of the audio signals from pairs of microphones selected from said plurality of microphones; and determining a region of intersection between each of said plurality of loci and each of said grouped points of intersection of said imaginary lines according to another distance measure, said region of intersection being an improved estimate of a location of an identified object. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
Specification