Method and apparatus for tracking moving objects using combined video and audio information in video conferencing and other applications
First Claim
1. A method for tracking an object of interest in a video processing system, the method comprising the steps of:
- generating for particular ones of successive plural measurement intervals an audio locator output from an audio input derived from detecting sound from an object, and a video locator output from a video input derived partly from a camera detecting movement of an object, each indicative of a location of the object of interest;
applying a set of confidence level rules to each of the audio locator output and video locator output to determine which one of the audio locator output and the video locator output has a higher confidence level, whereby the one having the highest confidence level will be utilized independently from the other to adjust a setting of the camera during each one of said successive plural measurement intervals, but if in a measurement interval the confidence levels are equivalent, the video locator output is used if above an established threshold, otherwise the audio locator output is utilized; and
adjusting the camera setting utilizing only the selected one of the audio locator output and the video locator output in accordance with the applied set of confidence level rules.
2 Assignments
0 Petitions
Accused Products
Abstract
A video processing system tracks a moving person or other object of interest using a combined audio-video tracking system. The audio-video tracking system comprises an audio locator, a video locator, and a set of rules for determining the manner in which settings of a camera are adjusted based on outputs of the audio locator and video locator. The set of rules may be configured such that only the audio locator output is used to adjust the camera settings if the audio locator and video locator outputs are not sufficiently close and a confidence indicator generated by the audio locator is above a specified threshold. For example, in such a situation, the audio locator output alone may be used to direct the camera to a new speaker in a video conference. If the audio locator and video locator outputs are sufficiently close, the system determines if a confidence indicator generated by the video locator is above a specified level, and if so, the video locator output may be used to adjust the camera settings. For example, the camera may be zoomed in such that the face of a video conference participant is centered in and occupies a designated portion of a video frame generated by the camera.
109 Citations
23 Claims
-
1. A method for tracking an object of interest in a video processing system, the method comprising the steps of:
-
generating for particular ones of successive plural measurement intervals an audio locator output from an audio input derived from detecting sound from an object, and a video locator output from a video input derived partly from a camera detecting movement of an object, each indicative of a location of the object of interest;
applying a set of confidence level rules to each of the audio locator output and video locator output to determine which one of the audio locator output and the video locator output has a higher confidence level, whereby the one having the highest confidence level will be utilized independently from the other to adjust a setting of the camera during each one of said successive plural measurement intervals, but if in a measurement interval the confidence levels are equivalent, the video locator output is used if above an established threshold, otherwise the audio locator output is utilized; and
adjusting the camera setting utilizing only the selected one of the audio locator output and the video locator output in accordance with the applied set of confidence level rules. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. An apparatus for tracking an object of interest in a video processing system, the apparatus comprising:
-
a camera; and
a processor coupled to the camera and operative (i) to process an audio locator output from an audio input signal, and a video locator output from a video input signal derived partly from movement of the object, each indicative of a location of the object of interest for particular ones of given measurement intervals of a plurality of successive measurement intervals; and
(ii) to apply a set of confidence level rules to each of the audio locator output and the video locator output to determine which one of the audio locator output and the video locator output has a higher confidence level, whereby the one having the highest confidence level will be utilized independently of the other to adjust a setting of the camera based on the given measurement interval. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. An article of manufacture comprising a storage medium for storing one or more programs for tracking an object of interest in a video processing system, wherein the one or more programs when executed by a processor implement the steps of:
-
generating for particular ones of given measurement intervals of a plurality of successive measurement intervals, an audio locator output from an audio input, and a video locator output from a video input derived partly from detection of movement of the object, each indicative of a location of the object of interest;
applying a set of confidence level rules to each of the audio locator output and the video locator output to determine which one of the audio locator output and the video locator output has a higher confidence level, whereby the one having the highest confidence level will be utilized independently of the other to adjust a setting of the camera based on the given measurement interval; and
adjusting the camera setting utilizing only the selected one of the audio locator output and the video locator output in accordance with the applied set of confidence level rules.
-
Specification