Pose determination and tracking by matching 3D objects to a 2D sensor
First Claim
1. A method of recognizing three-dimensional objects through parameter zoning, comprising the steps of:
- a) receiving data characterizing a three-dimensional object;
b) performing a transformation on the data to generate a projected image of at least a portion of the object;
c) subdividing the projected image into a zoned image containing a single object feature event describable in terms of X, Y, Z estimation space and pitch, roll, yaw angle space;
d) receiving a digitized scene from one or more sensors;
e) selecting an area of interest from the scene;
f) subdividing the area of interest into a zoned area of interest by;
zoning the initial pitch, roll, yaw angle space into a predetermined number of angular steps, and zoning the initial X, Y, Z estimation space using angular subdivisions;
g) comparing the zoned image to the zoned area of interest; and
h) repeating steps b), c), d) and g) to determine if the portion of the object is contained within the zoned area of interest.
1 Assignment
0 Petitions
Accused Products
Abstract
An improved method of pose determination and tracking does away with conventional segmentation while taking advantage of multi-degree-of-freedom numerical fitting or match filtering as opposed to a syntactic segment or feature oriented combinatorial match. The technique may be used to improve image database query based on object shape descriptors by allowing the user to request images from a database or video sequence which contain a key object described by a geometric description that the user designates or supplies. The approach is also applicable to target or object acquisition and tracking based on the matching of one or a set of object shape data structures.
603 Citations
8 Claims
-
1. A method of recognizing three-dimensional objects through parameter zoning, comprising the steps of:
-
a) receiving data characterizing a three-dimensional object;
b) performing a transformation on the data to generate a projected image of at least a portion of the object;
c) subdividing the projected image into a zoned image containing a single object feature event describable in terms of X, Y, Z estimation space and pitch, roll, yaw angle space;
d) receiving a digitized scene from one or more sensors;
e) selecting an area of interest from the scene;
f) subdividing the area of interest into a zoned area of interest by;
zoning the initial pitch, roll, yaw angle space into a predetermined number of angular steps, and zoning the initial X, Y, Z estimation space using angular subdivisions;
g) comparing the zoned image to the zoned area of interest; and
h) repeating steps b), c), d) and g) to determine if the portion of the object is contained within the zoned area of interest. - View Dependent Claims (2, 3)
repeating the zoning of the X, Y. Z and pitch, roll, yaw spaces using fixed and reducing step sizes.
-
-
3. The method of claim 1, further including the step of tuning parameters associated with the X, Y, Z and pitch, roll, yaw spaces so as to optimize the spaces one parameter dimension at a time.
-
4. A method of recognizing three-dimensional objects through parameter zoning, comprising the steps of:
-
a) receiving data characterizing a three-dimensional object;
b) performing a transformation on the data to generate a projected image of at least a portion of the object by;
i. determining an initial guess as to pitch, roll, yaw, ii. determining an initial guess as to X, Y, and Z by exploiting the constraints from the pitch, roll, yaw estimates, iii. determining an initial X, Y, Z, pitch, roll, yaw refinement based on a fixed step size minimum seeking algorithm, iv. refining the final optimum X, Y, Z, pitch, roll, yaw from the fixed step size algorithm with a variable step size algorithm, and v. using the current and past X, Y, Z, pitch, roll, yaw values to predict the next X, Y, Z, pitch, roll, yaw value followed by refinement through steps iii and iv;
c) subdividing the projected image into a zoned image;
d) receiving a digitized scene from one or more sensors;
e) selecting an area of interest from the scene;
f) subdividing the area of interest into a zoned area of interest;
g) comparing the zoned image to the zoned area of interest; and
h) repeating steps b), c), d) and g) to determine if the portion of the object is contained within the zoned area of interest.
-
-
5. A method of recognizing an object in an input image, comprising the steps of:
-
a) receiving data representative of a three-dimensional object;
b) performing a numerical transformation on the data to generate a projected image of at least a portion of the object by;
i. approximating a multi-degree of freedom matched filter using numerical methods, ii. using a cost function to solve for translations with respect to each degree of freedom to evaluate to the global minimum, and iii. using a minimum seeking method with the cost function at specific stages of matching, c) subdividing the projected image into a zoned image of the object;
d) receiving an input image from one or more sensors;
e) determining a region of interest within the input image;
f) partitioning the region of interest into a plurality of multiple pixel grids, each grid covering an object signature with a plurality of distinct zones;
g) analyzing each zone to determine the following;
1) the presence or absence of a primitive feature, and, if present, 2) the location and orientation of the primitive feature;
h) transforming the location and orientation of the primitive feature into a two-dimensional R-α
array;
i) scanning the R-α
array for a maximum value;
j) comparing the value to a significance threshold;
k) if the maximum value is greater than a significance threshold, assuming the grid includes a primitive feature at the R-α
location;
l) comparing the grid to the zoned image of the object to determine if the portion of the object is contained within the grid.
-
-
6. A method of recognizing an object in an input image, comprising the steps of:
-
a) receiving data representative of a three-dimensional object;
b) performing a numerical transformation on the data to generate a projected image of at least a portion of the object;
c) subdividing the projected image into a zoned image of the object;
d) receiving an input image from one or more sensors;
e) determining a region of interest within the input image;
f) partitioning the region of interest into a plurality of multiple pixel grids, each grid covering an object signature with a plurality of distinct zones;
g) analyzing each zone to determine the following;
1) the presence or absence of a primitive feature, and, if present, 2) the location and orientation of the primitive feature;
h) transforming the location and orientation of the primitive feature into a two-dimensional R-α
array using a Sobel operator to evaluate each pixel within the grid, resulting in an edge strength, L, a location, Xp, Yp, and a direction angle, α
;
converting the location (Xp, Yp) and the angle α
into an R and an α
according to the relation,
-
-
7. A method of recognizing an object in an input image, comprising the steps of:
-
a) receiving data representative of a three-dimensional object;
b) performing a numerical transformation on the data to generate a projected image of at least a portion of the object;
c) subdividing the projected image into a zoned image of the object;
d) receiving an input image from one or more sensors;
e) determining a region of interest within the input image;
f) partitioning the region of interest into a plurality of multiple pixel grids, each grid covering an object signature with a plurality of distinct zones;
g) analyzing each zone to determine the following;
1) the presence or absence of a primitive feature, and, if present, 2) the location and orientation of the primitive feature;
h) transforming the location and orientation of the primitive feature into a two-dimensional R-α
array;
i) scanning the R-α
array for a maximum value utilizing a bisection approach which evaluates the cost metric while changing a single parameter at a time between two limits at a small step size, s, the parameters including pitch, roll, yaw, X, Y and Z, andapplying the previous step in round-robin fashion with respect to pitch, roll, yaw, X, Y, or Z until the interval on each is less than a desired size, or until the process fails to be convergent j) comparing the value to a significance threshold;
k) if the maximum value is greater than a significance threshold, assuming the grid includes a primitive feature at the R-α
location;
l) comparing the grid to the zoned image of the object to determine if the portion of the object is contained within the grid.
-
-
8. A method of recognizing an object in an input image, comprising the steps of:
-
a) receiving data representative of a three-dimensional object;
b) performing a numerical transformation on the data to generate a projected image of at least a portion of the object;
c) subdividing the projected image into a zoned image of the object;
d) receiving an input image from one or more sensors;
e) determining a region of interest within the input image;
f) partitioning the region of interest into a plurality of multiple pixel grids, each grid covering an object signature with a plurality of distinct zones;
g) analyzing each zone to determine the following;
1) the presence or absence of a primitive feature, and, if present, 2) the location and orientation of the primitive feature;
h) transforming the location and orientation of the primitive feature into a two-dimensional R-α
array;
i) scanning the R-α
array for a maximum value using a descent approach which includes the following steps;
i. evaluating the cost function at +/−
km steps along one or more parameter dimensions, including pitch, roll, yaw, X, Y, or Z;
ii. checking each parameter against the most minimum value for this trial;
iii. if the new value is minimum, its location and value superseded the current minimum; and
iv. after evaluating all variations possible within +/−
km steps from the current hypothetical object location/orientation, jumping to the most minimum value and recursing until the same location is picked as minimum for several iterations;
j) comparing the value to a significance threshold;
k) if the maximum value is greater than a significance threshold, assuming the grid includes a primitive feature at the R-α
location;
l) comparing the grid to the zoned image of the object to determine if the portion of the object is contained within the grid.
-
Specification