Frame by frame, pixel by pixel matching of model-generated graphics images to camera frames for computer vision

US 8,102,390 B2
Filed: 09/12/2006
Issued: 01/24/2012
Est. Priority Date: 09/12/2005
Status: Expired due to Fees

First Claim

Patent Images

1. A method for tracking the location and view angle of a calibrated camera in real-time comprising the steps of:

(a) Creating an a priori model of the world in which the camera exists;

(b)Taking each raw, unprocessed video frame from the camera wherein each video frame is subdivided into rectangular or square areas that overlap by zero or up to 100 per cent of the pixels in each video frame;

(c) For each video frame, hypothesizing a small set of possible locations and view angles at which such frame is taken, the count of hypothesized set of locations and view angles being limited by computing the most probable motion vector and view angle of the camera from two frames, one preceding another, such computation comprising the steps of;

(i) computing a fast fourier transform of each area in a current camera frame, each area being processed independently of another to form a fast fourier transform matrix and;

(ii) taking the phase components of the resulting fast fourier transform matrix and coming up with a pure phase component matrix;

(iii) storing said pure phase component matrix in a memory;

(iv) utilizing said pure phase component matrices from the current frame and a previous frame, taking the phase differences between each area of the current camera frame and the corresponding area of the previous camera frame to form a phrase difference matrix;

(v) computing a inverse fast fourier transform of the phase difference matrix, resulting in a phase correlation surface;

(vi) determining a 2D position of the maximum in said phase correlation surface in each area, said 2D position forming a 2D optical flow vector for each area; and

(vii) calculating the most probable 3D motion vectors and view angles of the camera from optical flow vectors of all areas comprising the steps of;

(1) determining the heading or direction of movement in the world frame of reference, to define a line along which the most probable next positions lie;

(2) using the previous calculation of speed to determine a candidate next position along the line of heading;

(3) picking a number of most probable positions from a cubical selection of points around the calculated candidate; and

(4) using gradient descent to select the best next position within the cubical selection of points;

(d) For each video frame, rendering images using a graphics processor and vertex data from the a priori model, one rendered image for each hypothesized location and view angle andwherein each of the video frames and rendered images are of equal resolution; and

the rendered images are subdivided into rectangular or square areas that overlap by zero or up to 100 per cent of the pixels; and

(e)For each video frame, picking the best location and view angle by finding the best matching rendered image to the video frame.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention employs state-of-the-art computer graphics to advance the field of computer vision. The invention uses model-generated graphics in image processing: match image frames rendered by a graphics engine to those from a camera, in real-time, frame by frame, pixel by pixel. An a priori model of the world is required, but the benefit is very accurate position and pose of the camera for every frame.

Citations

17 Claims

1. A method for tracking the location and view angle of a calibrated camera in real-time comprising the steps of:
- (a) Creating an a priori model of the world in which the camera exists;
  
  (b)Taking each raw, unprocessed video frame from the camera wherein each video frame is subdivided into rectangular or square areas that overlap by zero or up to 100 per cent of the pixels in each video frame;
  
  (c) For each video frame, hypothesizing a small set of possible locations and view angles at which such frame is taken, the count of hypothesized set of locations and view angles being limited by computing the most probable motion vector and view angle of the camera from two frames, one preceding another, such computation comprising the steps of;
  
  (i) computing a fast fourier transform of each area in a current camera frame, each area being processed independently of another to form a fast fourier transform matrix and;
  
  (ii) taking the phase components of the resulting fast fourier transform matrix and coming up with a pure phase component matrix;
  
  (iii) storing said pure phase component matrix in a memory;
  
  (iv) utilizing said pure phase component matrices from the current frame and a previous frame, taking the phase differences between each area of the current camera frame and the corresponding area of the previous camera frame to form a phrase difference matrix;
  
  (v) computing a inverse fast fourier transform of the phase difference matrix, resulting in a phase correlation surface;
  
  (vi) determining a 2D position of the maximum in said phase correlation surface in each area, said 2D position forming a 2D optical flow vector for each area; and
  
  (vii) calculating the most probable 3D motion vectors and view angles of the camera from optical flow vectors of all areas comprising the steps of;
  
  (1) determining the heading or direction of movement in the world frame of reference, to define a line along which the most probable next positions lie;
  
  (2) using the previous calculation of speed to determine a candidate next position along the line of heading;
  
  (3) picking a number of most probable positions from a cubical selection of points around the calculated candidate; and
  
  (4) using gradient descent to select the best next position within the cubical selection of points;
  
  (d) For each video frame, rendering images using a graphics processor and vertex data from the a priori model, one rendered image for each hypothesized location and view angle andwherein each of the video frames and rendered images are of equal resolution; and
  
  the rendered images are subdivided into rectangular or square areas that overlap by zero or up to 100 per cent of the pixels; and
  
  (e)For each video frame, picking the best location and view angle by finding the best matching rendered image to the video frame.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 17)
- - 2. The method as claimed in claim 1, wherein the graphics processor is a low-cost graphics processor used to render the a priori model of the world.
  - 3. The method as claimed in claim 2, wherein the first video frame is from a known position and view angle.
  - 4. The method as claimed in claim 1, wherein the method of selecting the best matching rendered image to every video frame comprises the following sub-steps:
    - Computing a fast fourier transform (FFT) of each area in rendered image;
      
      Taking the phase components of each area'"'"'s FFT matrix;
      
      Utilizing the phase component matrices from the current frame and those from the rendered image, taking the phase differences between each area of the current camera frame and the corresponding area of the rendered image;
      
      such differences forming a phase correlation matrix;
      
      Computing the inverse FFT transform of the phase correlation matrix between the camera frame area and the rendered image area, resulting in the phase correlation surface for each area; and
      
      the best matching rendered image is that which has the lowest sum of squared (dot product) optical flow vectors, summed over all areas.
  - 5. The method as claimed in claim 1, wherein the method of selecting the best matching rendered image to every video frame comprises the following sub-steps:
    - For every rendered image, get differences in gray level for every pixel between the rendered image and the video frame;
      
      Calculate a simple sum of squares of all such differences for every area; and
      
      wherein the rendered image selected should be that with the least sum of squared differences with the video frame.
  - 6. The method as claimed in claim 1, wherein the a priori model is constructed using currently available tools.
  - 7. The method as claimed in claim 1, wherein the a priori model is constructed by image processing of video frames taken beforehand from the world in which the camera exists.
  - 8. The method as claimed in claim 1, wherein the a priori model is constructed in real-time concurrently with, but separate from, a motion estimation of the camera.
  - 17. A computer program product comprising a non-transitory computer-readable medium which, when loaded onto and executed by a computer, causes the computer to perform the method according to claim 1.

9. An apparatus for tracking the location and view angle of a camera in real-time comprising:
- (a) A video camera and its frame buffer whose contents are updated at a constant frame rate wherein each video frame from the camera is subdivided into rectangular or square areas that overlap by zero or up to 100 per cent of the pixels in each video frame;
  
  (b) Digital processing means for computing an optic flow from one video frame to another, and from such optic flow analysis hypothesizing a number of trial camera locations and view angles, the count of hypothesized set of locations and view angles being limited by computing a most probable motion vector and view angle of the camera from two frames, one preceding another, such computation comprising the steps of;
  
  (i) computing a fast fourier transform of each area in a current camera frame, each area being processed independently of another to form a fast fourier transform matrix;
  
  (ii) taking the phase components of the resulting fast fourier transform matrix and coming up with a pure phase component matrix;
  
  (iii) storing this pure phase component matrix in memory;
  
  (iv) utilizing the pure phase component matrices from the current camera frame and a previous camera frame, taking the phase differences between each area of the current camera frame and the corresponding area of the previous camera frame, such differences forming a phase correlation matrix;
  
  (v) computing an inverse fast fourier transform of the phase correlation matrix, resulting in a phase correlation surface;
  
  (vi) determining the 2d position of the maximum in the phase correlation surface in each area;
  
  such 2d position forms an optical flow vector for each area; and
  
  (vii) calculating the most probable 3d motion vectors and view angles of the camera from optical flow vectors of all areas comprising the steps of;
  
  (1) determining the heading or direction of movement in the world frame of reference, to define a line along which the most probable next positions lie;
  
  (2) using the previous calculation of speed to determine a candidate next position along the line of heading;
  
  (3) picking a number of most probable positions from a cubical selection of points around the calculated candidate; and
  
  (4) using gradient descent to select the best next position within the cubical selection of points;
  
  (c) An a priori model of the world;
  
  (d) A graphics processor or a plurality of graphics processors capable of multiple renderings of the world model at a fraction of the time it takes the camera to update the frame buffer;
  
  (e) A plurality of graphics surfaces or image buffers to store the rendered surfaces, each rendered surface corresponding to one of said trial location and one of said trial view angle in the world model;
  
  wherein the video frames and rendered images are of equal resolution; and
  
  the rendered images are subdivided into rectangular or square areas that overlap by zero or up to 100 per cent of the pixels;
  
  (f) Digital processing means for comparing each rendered image with the video frame buffer and then selecting the best matching rendered image, thereby also determining the most accurate instantaneous location and view angle of the camera.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The apparatus as claimed in claim 9, wherein the graphics processor is a low-cost graphics processor used to render the a priori model of the world.
  - 11. The apparatus as claimed in claim 10, wherein the apparatus is initialized such that computations start from a known position, view angle, velocity, and angular velocity.
  - 12. The apparatus as claimed in claim 9, configured to select the best matching rendered image to every video frame by using the following computations:
    - computing a fast fourier transform of each area in rendered image;
      
      taking the phase components of each area'"'"'s fast fourier transform matrix;
      
      utilizing the phase component matrices from the current frame and those from the rendered image, taking the phase differences between each area of the current camera frame and the corresponding area of the rendered image;
      
      such differences forming a phase correlation matrix;
      
      computing an inverse fast fourier transform of the phase correlation matrix between the camera frame area and the rendered image area, resulting in the phase correlation surface for each area,wherein the best matching rendered image is that which has the lowest sum of squared (dot product) optical flow vectors, summed over all areas.
  - 13. The apparatus as claimed in claim 9, configured to select the best matching rendered image to every video frame by way of the following computations:
    - For every rendered image, get differences in gray level for every pixel between the rendered image and the video frame;
      
      Calculate a simple sum of squares of all such differences for every area; and
      
      The rendered image selected should be that with the least sum of squared differences with the video frame.
  - 14. The apparatus as claimed in claim 9, wherein the a priori model is constructed using currently available tools.
  - 15. The apparatus as claimed in claim 9, wherein the a priori model is constructed by image processing of video frames taken beforehand from the world in which the camera exists.
  - 16. The apparatus as claimed in claim 9, wherein the a priori model is constructed in real-time concurrently with, but separate from, a motion estimation of said camera.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Centerus, Inc.
Original Assignee
Centerus, Inc.
Inventors
Tapang, Carlos
Primary Examiner(s)
Hajnik, Daniel

Application Number

US11/993,169
Publication Number

US 20100283778A1
Time in Patent Office

1,960 Days
Field of Search

None
US Class Current

345/419
CPC Class Codes

G06T 19/006   Mixed reality object pose d...

G06T 2207/10016   Video; Image sequence

G06T 2207/20056   Discrete and fast Fourier t...

G06T 7/75   involving models

Frame by frame, pixel by pixel matching of model-generated graphics images to camera frames for computer vision

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Frame by frame, pixel by pixel matching of model-generated graphics images to camera frames for computer vision

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links