FRAME BY FRAME, PIXEL BY PIXEL MATCHING OF MODEL-GENERATED GRAPHICS IMAGES TO CAMERA FRAMES FOR COMPUTER VISION
First Claim
1. A method for tracking the location and view angle of a calibrated camera in real-time (ego-motion) comprising the steps of:
- Creating an a priori model of the world in which the camera exists;
Taking each raw, unprocessed video frame from the camera;
For each video frame, hypothesizing a small set of possible locations and view angles at which such frame is taken;
For each video frame, rendering images using a graphics processor and vertex data from the a priori model, one image for each hypothesized location and view angle.For each video frame, picking the best location and view angle by finding the best matching image to the video frame.
1 Assignment
0 Petitions
Accused Products
Abstract
There are two distinct tasks in vision or image processing. On the one hand there is the difficult task of image analysis and feature recognition, and on the other there is the less difficult task of computing the 3D world position of the camera given an input image. In biological vision, these two tasks are intertwined together such that it is difficult to distinguish one from the other. We perceive our position in world coordinates by recognizing and triangulating from features around us. It seems we can not triangulate if we don'"'"'t identify first the features we triangulate from and we can'"'"'t really identify unless we can place a feature somewhere in the 3D world we live in. Most, if not all, vision systems in prior art are an attempt to implement both tasks in the same system. For instance, reference U.S. Pat. No. 5,801,970 comprises both tasks; reference U.S. Pat. No. 6,704,621 seems to comprise of triangulation alone, but it actually requires recognition of the road. If the triangulation task can indeed be made separate from and independent of the analysis and feature recognition tasks, then we would need half as much computing resources in a system that does not perform the latter task. By taking advantage of current advances in graphics processing, this invention allows for triangulation of the camera position without the usual scene analysis and feature recognition. It utilizes an a priori, accurate model of the world within the field of vision. The 3D model is rendered onto a graphics surface using the latest graphics processing units. Each frame coming from the camera is then searched for a best match in a number of candidate renderings on the graphics surface. The count of rendered images to compare to is made small by computing the change in camera position and angle of view from one frame to another, and then using the results of such computations to limit the next possible positions and angles of view to render the a priori world model. The main advantage of this invention over prior art is the mapping of the real world onto a world model. One application for which this is most suited is robotic programming. A robot that is guided by an a priori map and that knows its position in that map is far more superior to one that is not so guided. It is superior with regards to navigation, homing, path finding, obstacle avoidance, aiming for point of attention, and other robotic tasks.
22 Citations
23 Claims
-
1. A method for tracking the location and view angle of a calibrated camera in real-time (ego-motion) comprising the steps of:
-
Creating an a priori model of the world in which the camera exists; Taking each raw, unprocessed video frame from the camera; For each video frame, hypothesizing a small set of possible locations and view angles at which such frame is taken; For each video frame, rendering images using a graphics processor and vertex data from the a priori model, one image for each hypothesized location and view angle. For each video frame, picking the best location and view angle by finding the best matching image to the video frame. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 23)
-
-
12. An apparatus for tracking the location and view angle of a camera in real-time (ego-motion) comprising:
-
A video camera and its frame buffer whose contents are updated at a constant frame rate; Digital processing means for computing optic flow from one video frame to another, and from such optic flow analysis hypothesizing a number of trial camera locations and view angles; An a priori model of the world; A graphics processor or a plurality of graphics processors capable of multiple renderings of the world model at a fraction of the time it takes the camera to update the frame buffer; A plurality of graphics surfaces or image buffers to store the rendered surfaces, each rendered surface corresponding to a trial location and view angle in the world model; Digital processing means for comparing each rendered image with the video frame buffer and then selecting the best matching rendered image, thereby also determining the most accurate instantaneous location and view angle of the camera. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
Specification