Apparatus and method performing audio-video sensor fusion for object localization, tracking, and separation
First Claim
1. An apparatus for tracking and identifying objects using received sounds and video, comprising:
- an audio likelihood module which determines corresponding audio likelihoods for each of a plurality of the sounds received from corresponding different directions, each audio likelihood indicating a likelihood the sound is an object to be tracked;
a video likelihood module which determines video likelihoods for each of a plurality of images disposed in corresponding different directions in the video, each video likelihood indicating a likelihood that the image in the video is an object to be tracked; and
an identification and tracking module which;
determines correspondences between the audio likelihoods and the video likelihoods, if a correspondence is determined to exist between one of the audio likelihoods and one of the video likelihoods, identifies and tracks a corresponding one of the objects using each determined pair of audio and video likelihoods, and if a correspondence does not exist between a corresponding one of the audio likelihoods and a corresponding one of the video likelihoods, identifies a source of the sound or image as not being an object to tracked.
3 Assignments
0 Petitions
Accused Products
Abstract
An apparatus for tracking and identifying objects includes an audio likelihood module which determines corresponding audio likelihoods for each of a plurality of sounds received from corresponding different directions, each audio likelihood indicating a likelihood a sound is an object to be tracked; a video likelihood module which receives a video and determines video likelihoods for each of a plurality of images disposed in corresponding different directions in the video, each video likelihood indicating a likelihood that the image is an object to be tracked; and an identification and tracking module which determines correspondences between the audio likelihoods and the video likelihoods, if a correspondence is determined to exist between one of the audio likelihoods and one of the video likelihoods, identifies and tracks a corresponding one of the objects using each determined pair of audio and video likelihoods.
-
Citations
58 Claims
-
1. An apparatus for tracking and identifying objects using received sounds and video, comprising:
-
an audio likelihood module which determines corresponding audio likelihoods for each of a plurality of the sounds received from corresponding different directions, each audio likelihood indicating a likelihood the sound is an object to be tracked;
a video likelihood module which determines video likelihoods for each of a plurality of images disposed in corresponding different directions in the video, each video likelihood indicating a likelihood that the image in the video is an object to be tracked; and
an identification and tracking module which;
determines correspondences between the audio likelihoods and the video likelihoods, if a correspondence is determined to exist between one of the audio likelihoods and one of the video likelihoods, identifies and tracks a corresponding one of the objects using each determined pair of audio and video likelihoods, and if a correspondence does not exist between a corresponding one of the audio likelihoods and a corresponding one of the video likelihoods, identifies a source of the sound or image as not being an object to tracked. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
-
-
30. A method of tracking and identifying objects using at least one computer receiving audio and video data, the method comprising:
-
for each of a plurality of sounds received from corresponding different directions, determining in the at least one computer corresponding audio likelihoods, each audio likelihood indicating a likelihood the sound is an object to be tracked;
for each of a plurality of images disposed in corresponding different directions in a video, determining in the at least one computer video likelihoods, each video likelihood indicating a likelihood that the image in the video is an object to be tracked;
if a correspondence is determined to exist between one of the audio likelihoods and one of the video likelihoods, identifying and tracking in the at least one computer a corresponding one of the objects using each determined pair of audio and video likelihoods, and if a correspondence does not exist between a corresponding one of the audio likelihoods and a corresponding one of the video likelihoods, identifying in the at least one computer a source of the sound or image as not being an object to tracked. - View Dependent Claims (31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58)
-
Specification