Computer vision system for accurate monitoring of object pose
First Claim
1. A sensing system for producing at successive time instants digital signals expressing positions and orientations of a three dimensional (3-D) object defined by a translation vector and a rotation matrix grouped into a pose matrix that in turn effects changes in a peripheral device comprising:
- a single electronic camera having an image plane, an optical axis, a center of projection, a focal length, and a camera reference coordinate frame being centered at the center of projection with x and y axes parallel to the image plane, and a z-axis parallel to the optical axis, the single electronic camera producing an analog video signal;
at least four noncoplanar light sources rigidly attached to the 3-D object, the light sources having light source coordinates in an object reference coordinate frame of the 3-D object, the light sources projecting bright spots onto the image plane of the camera;
means for processing the analog video signal and determining a list of bright spot locations in the camera reference coordinate frame;
a computing means that includes memory means, processing means and output means;
the memory means storing a pseudo-inverse matrix B of a matrix A, wherein each row of the matrix A consists of four homogeneous coordinates of each of the light sources in a coordinate frame of reference of the object;
the memory means also storing a list of x-coordinates and a list of y-coordinates of the bright spots in the image plane of the camera;
the memory means also storing a list of correction factors to be applied to the list of x-coordinates and the list of y-coordinates, the list of correction factors depending on the position and orientation of the object, each element of the list of correction factors being initially set to zero if no knowledge about the position and orientation of the object is available, and being initially estimated otherwise;
the memory means also containing an iterative pose computing task for accurately computing the position and orientation of the object in the reference coordinate frame of the camera;
the iterative pose computing task comprising subtasks of;
(a) applying the correction factors to the list of x-coordinates to obtain a corrected list of x-coordinates and to the list of y coordinates to obtain a corrected list of y-coordinates,(b) multiplying the matrix B by the corrected list of x-coordinates and by the corrected list of y-coordinates to obtain a vector Q1 and a vector Q2,(c) finding a norm N1 of a vector R1 whose three coordinates are first three coordinates of vector Q1, and a norm N2 of a vector R2 whose three coordinates are first three coordinates of vector Q2,(d) dividing vector Q1 by N1 to obtain a first row of the pose matrix of the object and vector Q2 by N2 to obtain a second row of the pose matrix of the object,(e) computing a vector k as a cross-product of two vectors respectively defined by first three elements of the first row of the pose matrix and by first three elements of the second row of the pose matrix,(f) dividing the norm N1 by the focal length of the camera to obtain an inverse of a coordinate Tz of the translation vector of the object along the optical axis of the camera,(g) complementing the vector k with a fourth coordinate equal to the coordinate Tz of the translation vector to obtain a third row of the pose matrix of the object,(h) completing the pose matrix with a fourth row containing elements 0, 0, 0, and 1, and(i) computing a new list of correction factors as a vector obtained by multiplying the matrix A by the third row of the pose matrix, dividing each coordinate by Tz, and subtracting 1 from each coordinate;
and repeatedly using the iterative pose computing task by repeating the steps (a)-(i) until the new list of correction factors is equal to a previous list of the correction factors whereby for each new image of the camera, the iterative pose computing task produces a pose matrix of the object after a few iterations of the pose computing task; and
providing to the output means for each frame of the video signal three coordinates of the translation vector of the 3D object and nine elements of the rotation matrix of the 3-D object in digital form which is the computed pose matrix to effect changes in the peripheral device.
1 Assignment
0 Petitions
Accused Products
Abstract
A sensing system for accurately monitoring the position and orientation of an object (28). At least 4 point light sources (26) are mounted on the surface of the object (28). A single electronic camera (20) captures images (92) of the point light sources (26). Locations of these images (92) are detected in each camera image, and a computer runs an iterative task using these locations to obtain accurate estimates of the pose of the object (28) in a camera coordinate system (90) at video rate. The object is held by an operator (40) for cursor (60) control, for interaction with virtual reality scenes on computer displays (22), or for remote interactive control of teleoperated mechanisms.
-
Citations
9 Claims
-
1. A sensing system for producing at successive time instants digital signals expressing positions and orientations of a three dimensional (3-D) object defined by a translation vector and a rotation matrix grouped into a pose matrix that in turn effects changes in a peripheral device comprising:
-
a single electronic camera having an image plane, an optical axis, a center of projection, a focal length, and a camera reference coordinate frame being centered at the center of projection with x and y axes parallel to the image plane, and a z-axis parallel to the optical axis, the single electronic camera producing an analog video signal; at least four noncoplanar light sources rigidly attached to the 3-D object, the light sources having light source coordinates in an object reference coordinate frame of the 3-D object, the light sources projecting bright spots onto the image plane of the camera; means for processing the analog video signal and determining a list of bright spot locations in the camera reference coordinate frame; a computing means that includes memory means, processing means and output means; the memory means storing a pseudo-inverse matrix B of a matrix A, wherein each row of the matrix A consists of four homogeneous coordinates of each of the light sources in a coordinate frame of reference of the object; the memory means also storing a list of x-coordinates and a list of y-coordinates of the bright spots in the image plane of the camera; the memory means also storing a list of correction factors to be applied to the list of x-coordinates and the list of y-coordinates, the list of correction factors depending on the position and orientation of the object, each element of the list of correction factors being initially set to zero if no knowledge about the position and orientation of the object is available, and being initially estimated otherwise; the memory means also containing an iterative pose computing task for accurately computing the position and orientation of the object in the reference coordinate frame of the camera; the iterative pose computing task comprising subtasks of; (a) applying the correction factors to the list of x-coordinates to obtain a corrected list of x-coordinates and to the list of y coordinates to obtain a corrected list of y-coordinates, (b) multiplying the matrix B by the corrected list of x-coordinates and by the corrected list of y-coordinates to obtain a vector Q1 and a vector Q2, (c) finding a norm N1 of a vector R1 whose three coordinates are first three coordinates of vector Q1, and a norm N2 of a vector R2 whose three coordinates are first three coordinates of vector Q2, (d) dividing vector Q1 by N1 to obtain a first row of the pose matrix of the object and vector Q2 by N2 to obtain a second row of the pose matrix of the object, (e) computing a vector k as a cross-product of two vectors respectively defined by first three elements of the first row of the pose matrix and by first three elements of the second row of the pose matrix, (f) dividing the norm N1 by the focal length of the camera to obtain an inverse of a coordinate Tz of the translation vector of the object along the optical axis of the camera, (g) complementing the vector k with a fourth coordinate equal to the coordinate Tz of the translation vector to obtain a third row of the pose matrix of the object, (h) completing the pose matrix with a fourth row containing elements 0, 0, 0, and 1, and (i) computing a new list of correction factors as a vector obtained by multiplying the matrix A by the third row of the pose matrix, dividing each coordinate by Tz, and subtracting 1 from each coordinate; and repeatedly using the iterative pose computing task by repeating the steps (a)-(i) until the new list of correction factors is equal to a previous list of the correction factors whereby for each new image of the camera, the iterative pose computing task produces a pose matrix of the object after a few iterations of the pose computing task; and providing to the output means for each frame of the video signal three coordinates of the translation vector of the 3D object and nine elements of the rotation matrix of the 3-D object in digital form which is the computed pose matrix to effect changes in the peripheral device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. An apparatus for three-dimensional (3-D) cursor control by an operator comprising:
-
a plurality of light sources at least four in number configured in any noncoplanar arrangement; handle means for allowing an operator to hold and move the plurality of noncoplanar light sources freely in space; a single electronic camera having an image plane, a center of projection and an optical axis, the single electronic camera producing an analog video signal; processing means for processing the analog video signal and determining a list of positions of image projections of the light sources onto the image plane in a reference coordinate frame of the single electronic camera, the reference coordinate frame of the single electronic camera being centered at the single electronic camera'"'"'s center of projection with x and y axes parallel to the image plane, a z-axis parallel to the single electronic camera'"'"'s optical axis; a computing means for repeatedly combining the list of positions of image projections of the light sources with coordinates of the light sources in a coordinate frame of reference of the plurality of noncoplanar light sources, the computing means including memory means, processing means and output means; the memory means storing a pseudo-inverse matrix B of a matrix A, wherein each row of the matrix A consists of four homogeneous coordinates of each of the light sources in a coordinate frame of reference of the object; the memory means also storing a list of x-coordinates and a list of y-coordinates of the bright spots in the image plane of the camera; the memory means also storing a list of correction factors to be applied to the list of x-coordinates and the list of y-coordinates, the list of correction factors depending on the position and orientation of the object, each element of the list of correction factors being initially set to zero if no knowledge about the position and orientation of the object is available, and being initially estimated otherwise; the memory means also containing an iterative pose computing task for accurately computing the position and orientation of the object in the reference coordinate frame of the camera; the iterative pose computing task comprising subtasks of; (a) applying the correction factors to the list of x-coordinates to obtain a corrected list of x-coordinates and to the list of y coordinates to obtain a corrected list of y-coordinates, (b) multiplying the matrix B by the corrected list of x-coordinates and by the corrected list of y-coordinates to obtain a vector Q1 and a vector Q2, (c) finding a norm N1 of a vector R1 whose three coordinates are first three coordinates of vector Q1, and a norm N2 of a vector R2 whose three coordinates are first three coordinates of vector Q2, (d) dividing vector Q1 by N1 to obtain a first row of the pose matrix of the object and vector Q2 by N2 to obtain a second row of the pose matrix of the object, (e) computing a vector k as a cross-product of two vectors respectively defined by first three elements of the first row of the pose matrix and by first three elements of the second row of the pose matrix, (f) dividing the norm N1 by the focal length of the camera to obtain an inverse of a coordinate Tz of the translation vector of the object along the optical axis of the camera, (g) complementing the vector k with a fourth coordinate equal to the coordinate Tz of the translation vector to obtain a third row of the pose matrix of the object, (h) completing the pose matrix with a fourth row containing elements 0, 0, 0, and 1, and (i) computing a new list of correction factors as a vector obtained by multiplying the matrix A by the third row of the pose matrix, dividing each coordinate by Tz, and subtracting 1 from each coordinate; and repeatedly using the iterative pose computing task by repeating the steps (a)-(i) until the new list of correction factors is equal to a previous list of the correction factors whereby for each new image of the camera, the iterative pose computing task produces a pose matrix of the object after a few iterations of the pose computing task; and repeatedly outputting onto a display means in front of the operator'"'"'s eyes a perspective projection of a 3-D virtual cursor defined by the rotation matrix and the translation vector.
-
Specification