Visually tracking an object in real world using 2D appearance and multicue depth estimations
First Claim
1. A method for visually tracking a moving real-world object by estimating a three dimensional position and a two dimensional velocity of the moving object, the method comprising:
- capturing a first image at a camera at a first time;
processing an image region of the first captured image to obtain first visual cues representing first input features of the moving object to be tracked, the image region in the first image being the region the moving object is expected to be located at the first time;
initializing a 2D tracker template based on the first input features;
capturing a second image at the camera at a second time subsequent to the first time;
processing the second captured image to obtain second visual cues representing second input features;
at a processor, determining a two dimensional position of the moving object to be tracked from the second captured image by estimating parameters of a two dimensional transformation on the determined object in the first captured image or the first input features that best match the determined object in the second captured image or the second input features;
determining a two dimensional velocity of the object to be tracked based on the position of the moving object in the first and second captured images;
estimating a depth of the moving object to be tracked at the first time based on a third cue obtained at the first time;
estimating a change of the depth of the moving object to be tracked between the first time and the second time based on the third cue and a fourth cue obtained at the second time;
determining a relative change of size of the moving object to be tracked from the first captured image and the determined object from the second captured image by performing a two dimensional transformation on the determined object in the first and second captured images or the first and second input features;
updating the estimated depth and the change of the depth of the moving object to be tracked based on the relative change of the size of the moving object to be tracked;
combining the updated depth of the moving object to be tracked, the change of the depth of the moving object to be tracked, two dimensional position of the moving object to be tracked, and the two dimensional velocity of the moving object to be tracked to obtain a three dimensional coordinate of the object to be tracked at the second time; and
outputting the three dimensional coordinate of the object to be tracked and tracking the moving object.
1 Assignment
0 Petitions
Accused Products
Abstract
Estimating the dynamic states of a real-world object over time using a camera, two dimensional (2D) image information and a combination of different measurements of the distance between the object and the camera. The 2D image information is used to track a 2D position of the object as well as its 2D size of the appearance and change in the 2D size of the appearance of the object. In addition, the distance between the object and the camera is obtained from one or more direct depth measurements. The 2D position, the 2D size, and the depth of the object are coupled to obtain an improved estimation of three dimensional (3D) position and 3D velocity of the object. The object tracking apparatus uses the improved estimation to track real-world objects. The object tracking apparatus may be used on a moving platform such as a robot or a car with mounted cameras for a dynamic visual scene analysis.
20 Citations
18 Claims
-
1. A method for visually tracking a moving real-world object by estimating a three dimensional position and a two dimensional velocity of the moving object, the method comprising:
-
capturing a first image at a camera at a first time; processing an image region of the first captured image to obtain first visual cues representing first input features of the moving object to be tracked, the image region in the first image being the region the moving object is expected to be located at the first time; initializing a 2D tracker template based on the first input features; capturing a second image at the camera at a second time subsequent to the first time; processing the second captured image to obtain second visual cues representing second input features; at a processor, determining a two dimensional position of the moving object to be tracked from the second captured image by estimating parameters of a two dimensional transformation on the determined object in the first captured image or the first input features that best match the determined object in the second captured image or the second input features; determining a two dimensional velocity of the object to be tracked based on the position of the moving object in the first and second captured images; estimating a depth of the moving object to be tracked at the first time based on a third cue obtained at the first time; estimating a change of the depth of the moving object to be tracked between the first time and the second time based on the third cue and a fourth cue obtained at the second time; determining a relative change of size of the moving object to be tracked from the first captured image and the determined object from the second captured image by performing a two dimensional transformation on the determined object in the first and second captured images or the first and second input features; updating the estimated depth and the change of the depth of the moving object to be tracked based on the relative change of the size of the moving object to be tracked; combining the updated depth of the moving object to be tracked, the change of the depth of the moving object to be tracked, two dimensional position of the moving object to be tracked, and the two dimensional velocity of the moving object to be tracked to obtain a three dimensional coordinate of the object to be tracked at the second time; and outputting the three dimensional coordinate of the object to be tracked and tracking the moving object. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A tracking apparatus including a processor and a memory storing instruction, the instructions when executed by the processor cause the processor to:
-
capture a first image at a camera at a first time; processing an image region of the first captured image to obtain first visual cues representing first input features of the moving object to be tracked, the image region in the first image being the region the moving object is expected to be located at the first time; initialize a 2D tracker template based on the first input features; capture a second image at the camera at a second time subsequent to the first time; process the second captured image to obtain second visual cues representing second input features; determine a two dimensional position of the moving object to be tracked from the second captured image by estimating parameters of a two dimensional transformation on the determined object in the first captured image or the first input features that best match the determined object in the second captured image or the second input features; estimate a depth of the moving object to be tracked at the first time based on a third cue obtained at the first time; estimate a change of the depth of the moving object between the first time and the second time based on the third cue and a fourth cue obtained at the second time; determine a relative change of size of the moving object to be tracked from the first captured image and the determined object from the second captured image by performing a two dimensional transformation on determined object in the first and second images or the first and second input features; update the estimated depth and the change of the depth of the moving object to be tracked based on the relative change of the size of the moving object to be tracked; combine the updated depth of the moving object to be tracked, the change of the depth of the moving object to be tracked, two dimensional position of the moving object to be tracked, and the two dimensional velocity of the moving object to obtain a three dimensional coordinate of the object at the second time; and output the three dimensional coordinate of the object to be tracked and tracking the moving object.
-
-
13. A humanoid robot including a processor and a memory storing instruction, the instructions when executed by the processor cause the processor to:
-
capture a first image at a camera at a first time; process an image region of the first captured image to obtain first visual cues representing first input features of the moving object to be tracked, the image region in the first image being the region the moving object is expected to be located at the first time; initialize a 2D tracker template based on the first input features; capture a second image at the camera at a second time subsequent to the first time; process the second captured image to obtain second visual cues representing second input features; determine a two dimensional position of the moving object to be tracked from the second captured image by estimating parameters of a two dimensional transformation on the determined object in the first captured image or the first input features that best match the determined object in the second captured image or the second input features; determine a two dimensional velocity of the object to be tracked based on the position of the moving object in the first and second captured images; estimate a depth of the moving object to be tracked at the first time based on a third cue obtained at the first time; estimate a change of the depth of the moving object to be tracked between the first time and the second time based on the third cue and a fourth cue obtained at the second time; determine a relative change of size of the moving object to be tracked from the first captured image and the determined object from the second captured image by performing a two dimensional transformation on the determined object in the first and second captured images or the first and second input features; update the estimated depth and the change of the depth of the moving object to be tracked based on the relative change of the size of the moving object to be tracked; combine the updated depth of the moving object to be tracked, the change of the depth of the moving object to be tracked, two dimensional position of the moving object to be tracked, and the two dimensional velocity of the moving object to be tracked to obtain a three dimensional coordinate of the object to be tracked at the second time; and output the three dimensional coordinate of the object to be tracked and tracking the moving object. - View Dependent Claims (14)
-
-
15. An automobile including a processor and a memory storing instruction, the instructions when executed by the processor cause the processor to:
-
capture a first image at a camera at a first time; process an image region of the first captured image to obtain first visual cues representing first input features of the moving object to be tracked, the image region in the first image being the region the moving object is expected to be located at the first time; initialize a 2D tracker template based on the first input features; capture a second image at the camera at a second time subsequent to the first time; process the second captured image to obtain second visual cues representing second input features; determine a two dimensional position of the moving object to be tracked from the second captured image by estimating parameters of a two dimensional transformation on the determined object in the first captured image or the first input features that best match the determined object in the second captured image or the second input features; determine a two dimensional velocity of the object to be tracked based on the position of the moving object in the first and second captured images; estimate a depth of the moving object to be tracked at the first time based on a third cue obtained at the first time; estimate a change of the depth of the moving object to be tracked between the first time and the second time based on the third cue and a fourth cue obtained at the second time; determine a relative change of size of the moving object to be tracked from the first captured image and the determined object from the second captured image by performing a two dimensional transformation on the determined object in the first and second captured images or the first and second input features; update the estimated depth and the change of the depth of the moving object to be tracked based on the relative change of the size of the moving object to be tracked; combine the updated depth of the moving object to be tracked, the change of the depth of the moving object to be tracked, two dimensional position of the moving object to be tracked, and the two dimensional velocity of the moving object to be tracked to obtain a three dimensional coordinate of the object to be tracked at the second time; and output the three dimensional coordinate of the object to be tracked and tracking the moving object. - View Dependent Claims (16)
-
-
17. A non-transitory computer readable storage medium configured to store instructions, the instructions when executed by a processor cause the processor to:
-
capture a first image at a camera at a first time; process an image region of the first captured image to obtain first visual cues representing first input features of the moving object to be tracked, the image region in the first image being the region the moving object is expected to be located at the first time; initialize a 2D tracker template based on the first input features; capture a second image at the camera at a second time subsequent to the first time; process the second captured image to obtain second visual cues representing second input features; determine a two dimensional position of the moving object to be tracked from the second captured image by estimating parameters of a two dimensional transformation on the determined object in the first captured image or the first input features that best match the determined object in the second captured image or the second input features; determine a two dimensional velocity of the object to be tracked based on the position of the moving object in the first and second captured images; estimate a depth of the moving object to be tracked at the first time based on a third cue obtained at the first time; estimate a change of the depth of the moving object to be tracked between the first time and the second time based on the third cue and a fourth cue obtained at the second time; determine a relative change of size of the moving object to be tracked from the first captured image and the determined object from the second captured image by performing a two dimensional transformation on the determined object in the first and second captured images or the first and second input features; update the estimated depth and the change of the depth of the moving object to be tracked based on the relative change of the size of the moving object to be tracked; combine the updated depth of the moving object to be tracked, the change of the depth of the moving object to be tracked, two dimensional position of the moving object to be tracked, and the two dimensional velocity of the moving object to be tracked to obtain a three dimensional coordinate of the object to be tracked at the second time; and output the three dimensional coordinate of the object to be tracked and tracking the moving object. - View Dependent Claims (18)
-
Specification