VISUALLY TRACKING AN OBJECT IN REAL WORLD USING 2D APPEARANCE AND MULTICUE DEPTH ESTIMATIONS
First Claim
1. A method for visually tracking a real-world object by estimating three dimensional parameters of the object, the method comprising:
- capturing a first image at a camera at a first time;
preprocessing a first region of the captured first image to obtain first input features, the object expected to be in the first region at the first time based on a first cue;
initializing a tracker template based on the first input features responsive to indication of a second region of the first image received from a source other than the first cue;
capturing a second image at the camera at a second time subsequent to the first time;
preprocessing a third region of the second image to obtain second input features, the object expected to be in the third region in the second time based on a second cue;
estimating an appearance of the object at the second time based on a two dimensional position and a two dimensional velocity of the object determined by the first input features and the second input features;
estimating a depth of the object at the first time based on a third cue obtained at the first time;
estimating a change of the depth of the object between the first time and the second time based on the third cue and a fourth cue obtained at the second time;
determining a relative change of size of the object by performing a two dimensional transformation on the first and second images or the first and second input features;
updating the estimated depth and the change of the depth of the object based on the relative change of the size of the object;
combining the updated depth of the object, the change of the depth of the object, two dimensional position of the object, and the two dimensional velocity of the object to obtain a three dimensional coordinate of the object at the second time; and
outputting the three dimensional coordinate of the object.
1 Assignment
0 Petitions
Accused Products
Abstract
Estimating the dynamic states of a real-world object over time using a camera, two dimensional (2D) image information and a combination of different measurements of the distance between the object and the camera. The 2D image information is used to track a 2D position of the object as well as its 2D size of the appearance and change in the 2D size of the appearance of the object. In addition, the distance between the object and the camera is obtained from one or more direct depth measurements. The 2D position, the 2D size, and the depth of the object are coupled to obtain an improved estimation of three dimensional (3D) position and 3D velocity of the object. The object tracking apparatus uses the improved estimation to track real-world objects. The object tracking apparatus may be used on a moving platform such as a robot or a car with mounted cameras for a dynamic visual scene analysis.
-
Citations
13 Claims
-
1. A method for visually tracking a real-world object by estimating three dimensional parameters of the object, the method comprising:
-
capturing a first image at a camera at a first time; preprocessing a first region of the captured first image to obtain first input features, the object expected to be in the first region at the first time based on a first cue; initializing a tracker template based on the first input features responsive to indication of a second region of the first image received from a source other than the first cue; capturing a second image at the camera at a second time subsequent to the first time; preprocessing a third region of the second image to obtain second input features, the object expected to be in the third region in the second time based on a second cue; estimating an appearance of the object at the second time based on a two dimensional position and a two dimensional velocity of the object determined by the first input features and the second input features; estimating a depth of the object at the first time based on a third cue obtained at the first time; estimating a change of the depth of the object between the first time and the second time based on the third cue and a fourth cue obtained at the second time; determining a relative change of size of the object by performing a two dimensional transformation on the first and second images or the first and second input features; updating the estimated depth and the change of the depth of the object based on the relative change of the size of the object; combining the updated depth of the object, the change of the depth of the object, two dimensional position of the object, and the two dimensional velocity of the object to obtain a three dimensional coordinate of the object at the second time; and outputting the three dimensional coordinate of the object. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A tracking apparatus including a processor and a memory storing instruction, the instructions when executed by the processor cause the processor to:
-
capture a first image at a camera at a first time; preprocess a first region of the captured first image to obtain first input features, the object expected to be in the first region at the first time based on a first cue; initialize a tracker template based on the first input features responsive to indication of a second region of the first image received from a source other than the first cue; capture a second image at the camera at a second time subsequent to the first time; preprocess a third region of the second image to obtain second input features, the object expected to be in the third region at the second time based on a second cue; estimate an appearance of the object at the second time based on a two dimensional position and a two dimensional velocity of the object determined by the first input features and the second input features; estimate a depth of the object at the first time based on a third cue obtained at the first time; estimate a change of the depth of the object between the first time and the second time based on the third cue and a fourth cue obtained at the second time; determine a relative change of size of the object by performing a two dimensional transformation on the first and second images or the first and second input features; update the estimated depth and the change of the depth of the object based on the relative change of the size of the object; combine the updated depth of the object, the change of the depth of the object, two dimensional position of the object, and the two dimensional velocity of the object to obtain a three dimensional coordinate of the object at the second time; and output the three dimensional coordinate of the object.
-
-
11. A humanoid robot including a processor and a memory storing instruction, the instructions when executed by the processor cause the processor to:
-
capture a first image at a camera at a first time; preprocess a first region of the captured first image to obtain first input features, the object expected to be in the first region at the first time based on a first cue; initialize a tracker template based on the first input features responsive to indication of a second region of the first image received from a source other than the first cue; capture a second image at the camera at a second time subsequent to the first time; preprocess a third region of the second image to obtain second input features, the object expected to be in the third region at the second time based on a second cue; estimate an appearance of the object at the second time based on a two dimensional position and a two dimensional velocity of the object determined by the first input features and the second input features; estimate a depth of the object at the first time based on a third cue obtained at the first time; estimate a change of the depth of the object between the first time and the second time based on the third cue and a fourth cue obtained at the second time; determine a relative change of size of the object by performing a two dimensional transformation on the first and second images or the first and second input features; update the estimated depth and the change of the depth of the object based on the relative change of the size of the object; combine the updated depth of the object, the change of the depth of the object, two dimensional position of the object, and the two dimensional velocity of the object to obtain a three dimensional coordinate of the object at the second time; and output the three dimensional coordinate of the object.
-
-
12. An automobile including a processor and a memory storing instruction, the instructions when executed by the processor cause the processor to:
-
capture a first image at a camera at a first time; preprocess a first region of the captured first image to obtain first input features, the object expected to be at the first region at the first time based on a first cue; initialize a tracker template based on the first input features responsive to indication of a second region of the first image received from a source other than the first cue; capture a second image at the camera at a second time subsequent to the first time; preprocess a third region of the second image to obtain second input features, the object expected to be at the third region at the second time based on a second cue; estimate an appearance of the object at the second time based on a two dimensional position and a two dimensional velocity of the object determined by the first input features and the second input features; estimate a depth of the object at the first time based on a third cue obtained at the first time; estimate a change of the depth of the object between the first time and the second time based on the third cue and a fourth cue obtained at the second time; determine a relative change of size of the object by performing a two dimensional transformation on the first and second images or the first and second input features; update the estimated depth and the change of the depth of the object based on the relative change of the size of the object; combine the updated depth of the object, the change of the depth of the object, two dimensional position of the object, and the two dimensional velocity of the object to obtain a three dimensional coordinate of the object at the second time; and output the three dimensional coordinate of the object.
-
-
13. A computer readable storage medium configured to store instructions, the instructions when executed by a processor cause the processor to:
-
capture a first image at a camera at a first time; preprocess a first region of the captured first image to obtain first input features, the object expected to be at the first region at the first time based on a first cue; initialize a tracker template based on the first input features responsive to indication of a second region of the first image received from a source other than the first cue; capture a second image at the camera at a second time subsequent to the first time; preprocess a third region of the second image to obtain second input features, the object expected to be at the third region at the second time based on a second cue; estimate an appearance of the object at the second time based on a two dimensional position and a two dimensional velocity of the object determined by the first input features and the second input features; estimate a depth of the object at the first time based on a third cue obtained at the first time; estimate a change of the depth of the object between the first time and the second time based on the third cue and a fourth cue obtained at the second time; determine a relative change of size of the object by performing a two dimensional transformation on the first and second images or the first and second input features; update the estimated depth and the change of the depth of the object based on the relative change of the size of the object; combine the updated depth of the object, the change of the depth of the object, two dimensional position of the object, and the two dimensional velocity of the object to obtain a three dimensional coordinate of the object at the second time; and output the three dimensional coordinate of the object.
-
Specification