Frame by frame, pixel by pixel matching of model-generated graphics images to camera frames for computer vision
First Claim
Patent Images
1. A method for tracking the location and view angle of a calibrated camera in real-time comprising the steps of:
- (a) Creating an a priori model of the world in which the camera exists;
(b)Taking each raw, unprocessed video frame from the camera wherein each video frame is subdivided into rectangular or square areas that overlap by zero or up to 100 per cent of the pixels in each video frame;
(c) For each video frame, hypothesizing a small set of possible locations and view angles at which such frame is taken, the count of hypothesized set of locations and view angles being limited by computing the most probable motion vector and view angle of the camera from two frames, one preceding another, such computation comprising the steps of;
(i) computing a fast fourier transform of each area in a current camera frame, each area being processed independently of another to form a fast fourier transform matrix and;
(ii) taking the phase components of the resulting fast fourier transform matrix and coming up with a pure phase component matrix;
(iii) storing said pure phase component matrix in a memory;
(iv) utilizing said pure phase component matrices from the current frame and a previous frame, taking the phase differences between each area of the current camera frame and the corresponding area of the previous camera frame to form a phrase difference matrix;
(v) computing a inverse fast fourier transform of the phase difference matrix, resulting in a phase correlation surface;
(vi) determining a 2D position of the maximum in said phase correlation surface in each area, said 2D position forming a 2D optical flow vector for each area; and
(vii) calculating the most probable 3D motion vectors and view angles of the camera from optical flow vectors of all areas comprising the steps of;
(1) determining the heading or direction of movement in the world frame of reference, to define a line along which the most probable next positions lie;
(2) using the previous calculation of speed to determine a candidate next position along the line of heading;
(3) picking a number of most probable positions from a cubical selection of points around the calculated candidate; and
(4) using gradient descent to select the best next position within the cubical selection of points;
(d) For each video frame, rendering images using a graphics processor and vertex data from the a priori model, one rendered image for each hypothesized location and view angle andwherein each of the video frames and rendered images are of equal resolution; and
the rendered images are subdivided into rectangular or square areas that overlap by zero or up to 100 per cent of the pixels; and
(e)For each video frame, picking the best location and view angle by finding the best matching rendered image to the video frame.
1 Assignment
0 Petitions
Accused Products
Abstract
The invention employs state-of-the-art computer graphics to advance the field of computer vision. The invention uses model-generated graphics in image processing: match image frames rendered by a graphics engine to those from a camera, in real-time, frame by frame, pixel by pixel. An a priori model of the world is required, but the benefit is very accurate position and pose of the camera for every frame.
-
Citations
17 Claims
-
1. A method for tracking the location and view angle of a calibrated camera in real-time comprising the steps of:
-
(a) Creating an a priori model of the world in which the camera exists; (b)Taking each raw, unprocessed video frame from the camera wherein each video frame is subdivided into rectangular or square areas that overlap by zero or up to 100 per cent of the pixels in each video frame;
(c) For each video frame, hypothesizing a small set of possible locations and view angles at which such frame is taken, the count of hypothesized set of locations and view angles being limited by computing the most probable motion vector and view angle of the camera from two frames, one preceding another, such computation comprising the steps of;(i) computing a fast fourier transform of each area in a current camera frame, each area being processed independently of another to form a fast fourier transform matrix and; (ii) taking the phase components of the resulting fast fourier transform matrix and coming up with a pure phase component matrix; (iii) storing said pure phase component matrix in a memory; (iv) utilizing said pure phase component matrices from the current frame and a previous frame, taking the phase differences between each area of the current camera frame and the corresponding area of the previous camera frame to form a phrase difference matrix; (v) computing a inverse fast fourier transform of the phase difference matrix, resulting in a phase correlation surface; (vi) determining a 2D position of the maximum in said phase correlation surface in each area, said 2D position forming a 2D optical flow vector for each area; and (vii) calculating the most probable 3D motion vectors and view angles of the camera from optical flow vectors of all areas comprising the steps of; (1) determining the heading or direction of movement in the world frame of reference, to define a line along which the most probable next positions lie; (2) using the previous calculation of speed to determine a candidate next position along the line of heading; (3) picking a number of most probable positions from a cubical selection of points around the calculated candidate; and (4) using gradient descent to select the best next position within the cubical selection of points; (d) For each video frame, rendering images using a graphics processor and vertex data from the a priori model, one rendered image for each hypothesized location and view angle and wherein each of the video frames and rendered images are of equal resolution; and
the rendered images are subdivided into rectangular or square areas that overlap by zero or up to 100 per cent of the pixels; and(e)For each video frame, picking the best location and view angle by finding the best matching rendered image to the video frame. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 17)
-
-
9. An apparatus for tracking the location and view angle of a camera in real-time comprising:
-
(a) A video camera and its frame buffer whose contents are updated at a constant frame rate wherein each video frame from the camera is subdivided into rectangular or square areas that overlap by zero or up to 100 per cent of the pixels in each video frame; (b) Digital processing means for computing an optic flow from one video frame to another, and from such optic flow analysis hypothesizing a number of trial camera locations and view angles, the count of hypothesized set of locations and view angles being limited by computing a most probable motion vector and view angle of the camera from two frames, one preceding another, such computation comprising the steps of; (i) computing a fast fourier transform of each area in a current camera frame, each area being processed independently of another to form a fast fourier transform matrix; (ii) taking the phase components of the resulting fast fourier transform matrix and coming up with a pure phase component matrix;
(iii) storing this pure phase component matrix in memory;(iv) utilizing the pure phase component matrices from the current camera frame and a previous camera frame, taking the phase differences between each area of the current camera frame and the corresponding area of the previous camera frame, such differences forming a phase correlation matrix; (v) computing an inverse fast fourier transform of the phase correlation matrix, resulting in a phase correlation surface; (vi) determining the 2d position of the maximum in the phase correlation surface in each area;
such 2d position forms an optical flow vector for each area; and(vii) calculating the most probable 3d motion vectors and view angles of the camera from optical flow vectors of all areas comprising the steps of; (1) determining the heading or direction of movement in the world frame of reference, to define a line along which the most probable next positions lie; (2) using the previous calculation of speed to determine a candidate next position along the line of heading; (3) picking a number of most probable positions from a cubical selection of points around the calculated candidate; and (4) using gradient descent to select the best next position within the cubical selection of points; (c) An a priori model of the world; (d) A graphics processor or a plurality of graphics processors capable of multiple renderings of the world model at a fraction of the time it takes the camera to update the frame buffer; (e) A plurality of graphics surfaces or image buffers to store the rendered surfaces, each rendered surface corresponding to one of said trial location and one of said trial view angle in the world model; wherein the video frames and rendered images are of equal resolution; and
the rendered images are subdivided into rectangular or square areas that overlap by zero or up to 100 per cent of the pixels;(f) Digital processing means for comparing each rendered image with the video frame buffer and then selecting the best matching rendered image, thereby also determining the most accurate instantaneous location and view angle of the camera. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
Specification