Method of pose estimation and model refinement for video representation of a three dimensional scene

US 20010043738A1
Filed: 03/07/2001
Published: 11/22/2001
Est. Priority Date: 03/07/2000
Status: Active Grant

First Claim

Patent Images

1. A method for accurately estimating a pose of a camera within a scene using a three dimension model of the scene, comprising the steps of:

(a) generating an initial estimate of the pose;

(b) selecting a set of relevant features of the three dimensional model based on the initial estimate of the pose;

(c) creating a virtual projection of the set of relevant features responsive to the initial estimate of the pose;

(d) matching a plurality of features of an image received from the camera to the virtual projection of the set of relevant features and measuring a plurality of matching errors; and

(e) updating the estimate of the pose to reduce the plurality of matching errors.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention is embodied in a video flashlight method. This method creates virtual images of a scene using a dynamically updated three-dimensional model of the scene and at least one video sequence of images. An estimate of the camera pose is generated by comparing a present image to the three-dimensional model. Next, relevant features of the model are selected based on the estimated pose. The relevant features are then virtually projected onto the estimated pose and matched to features of the image. Matching errors are measured between the relevant features of the virtual projection and the features of the image. The estimated pose is then updated to reduce these matching errors. The model is also refined with updated information from the image. Meanwhile, a viewpoint for a virtual image is selected. The virtual image is then created by projecting the dynamically updated three-dimensional model onto the selected virtual viewpoint.

147 Citations

42 Claims

1. A method for accurately estimating a pose of a camera within a scene using a three dimension model of the scene, comprising the steps of:
- (a) generating an initial estimate of the pose;
  
  (b) selecting a set of relevant features of the three dimensional model based on the initial estimate of the pose;
  
  (c) creating a virtual projection of the set of relevant features responsive to the initial estimate of the pose;
  
  (d) matching a plurality of features of an image received from the camera to the virtual projection of the set of relevant features and measuring a plurality of matching errors; and
  
  (e) updating the estimate of the pose to reduce the plurality of matching errors.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method for claim 1, further comprising the step of:
    - (f) repeating to steps (c), (d), and (e) using the updated estimate of the pose until the plurality of matching errors are less than predetermined matching criteria.
  - 3. The method of claim 1, further comprising the steps of:
    - (f) adding the updates to the estimate of the pose made in step (e) to a total change value of the estimate of the pose;
      
      (g) if the total change value is less than predetermined pose change criteria, resetting the total change value and repeating to steps (b), (c), (d), (e) and (f) using the updated estimate of the pose; and
      
      (h) repeating to steps (c), (d), (e), (f) and (g) using the updated estimate of the pose until the plurality of matching errors are less than predetermined matching criteria.
  - 4. The method of claim 1, wherein step (a) includes the step of comparing an image received from the camera to the three dimensional model of the scene to generate the estimate of the pose.
  - 5. The method of claim 1, wherein step (a) includes the step of using a preceding pose estimate of a preceding image received from the camera to generate the estimate of the pose.
  - 6. The method of claim 1, wherein step (b) includes the steps of:
    - (b1) Z-buffering the three dimensional model based on the estimated pose to determine a set of visible features of the three dimensional model; and
      
      (b2) selecting the set of relevant features from the set of visible features.
  - 7. The method of claim 1, wherein step (b) includes the steps of:
    - (b1) creating a set of features of the three dimensional model including a plurality of edges of at least one object represented in the three dimensional model, each edge having a dihedral angle of greater than a predetermined angle;
      
      (b2) selecting the set of relevant features from the set of features of the three dimensional model.
  - 8. The method of claim 1, wherein step (c) includes the step of removing a model feature from the set of relevant features if the model feature is less than a predetermined distance from a remaining relevant feature in the virtual projection.
  - 9. The method of claim 1, further comprising the step of:
    - (f) perturbing the estimate of the pose and repeating to steps (c), (d), and (e) using the perturbed estimate of the pose until the plurality of matching errors are less than predetermined matching criteria.
  - 10. The method of claim 1, wherein step (d) includes the steps of:
    - (d1) computing an oriented energy image of the image received from the camera; and
      
      (d2) integrating the oriented energy image along the virtual projection of the set of relevant features and measuring the plurality of matching errors.
  - 11. The method of claim 1, wherein step (d) includes the steps of:
    - (d1) computing an oriented energy image of the image received from the camera;
      
      (d2) generating a set of scaled oriented energy images;
      
      (d3) selecting a scaled oriented energy image from the set of scaled oriented energy images; and
      
      (d4) integrating the selected scaled oriented energy image along the virtual projection of the set of relevant features and measuring the plurality of matching errors.

12. A method for accurately estimating a pose of an image of a scene using a three dimension model of the scene, comprising the steps of:
- (a) performing a pyramid decomposition of the image to generate a set of pyramid levels of the image;
  
  (b) selecting one pyramid level from the set of pyramid levels;
  
  (c) generating an estimate of the pose using the selected pyramid level;
  
  (d) selecting a set of relevant features of the three dimensional model based on the estimate of the pose;
  
  (e) creating a virtual projection of the set of relevant features responsive to the estimate of the pose;
  
  (f) matching a plurality of features of the selected pyramid level to the virtual projection of the set of relevant features and measuring a plurality of matching errors;
  
  (g) updating the estimate of the pose to reduce the plurality of matching errors; and
  
  (h) repeating to steps (e), (f), and (g) using the updated estimate of the pose until the plurality of matching errors are less than predetermined matching criteria which is responsive to the selected pyramid level.
- View Dependent Claims (13, 14)
- - 13. The method of claim 12, further comprising the steps of;
    - (i) removing the selected pyramid level from the set of pyramid levels; and
      
      (j) repeating to steps (b), (c), (d), (e), (f), (g), (h) and (i) until the set of pyramid levels contains no elements;
      
      wherein the selected level in step (b) has a lowest resolution level of the pyramid levels in the set of pyramid levels;
      
      .
  - 14. The method of claim 12, wherein step (e) includes the step of removing a model feature from the set of relevant features if the model feature is less than a predetermined distance from another relevant feature in the virtual projection, the predetermined distance being responsive to the selected pyramid level.

15. A method for refining a three dimensional model of a scene using an image of the scene taken by a camera having an unknown pose, comprising the steps of:
- (a) comparing the image to the three dimension model of the scene to generate an estimate of the pose; and
  
  (b) updating the three dimensional model of the scene based on data from the image and the estimate of the pose.
- View Dependent Claims (16)
- - 16. The method claim 15, wherein step (b) includes the step of:
    - (b1) projecting a plurality of color values of the image onto the three dimension model of the scene to update a texture of a surface of the three dimensional model.

17. A method for accurately estimating a position of remote vehicle using a three dimension model and an image from a camera having a known orientation relative to the remote vehicle, comprising the steps of:
- (a) comparing the image to the three dimension model of the scene to generate an estimate of the pose;
  
  (b) selecting a set of relevant features of the three dimensional model based on the estimate of the pose;
  
  (c) matching a plurality of features of the image to the set of relevant features and measuring a plurality of matching errors;
  
  (d) updating the estimate of the pose based on the plurality of matching errors; and
  
  (e) determining the position of the remote vehicle based on the estimate of the pose and the orientation of the camera.
- View Dependent Claims (18)
- - 18. The method of claim 17, further comprising the step of:
    - (f) updating the three dimensional model based on data from the image and the updated estimate of the pose.

19. A method for refining a three dimension model of a scene containing an object using a plurality of images of the scene, each image including the object, comprising the steps of:
- (a) comparing a first image of the plurality of images to the three dimension model of the scene to generate an estimate of a first viewpoint corresponding to the first image;
  
  (b) comparing a second image of the plurality of images to the three dimension model of the scene to generate an estimate of a second viewpoint corresponding to the second image;
  
  (c) selecting a first set of relevant features of the three dimensional model based on the first viewpoint;
  
  (d) matching a plurality of first features of the first image to the first set of relevant features and measuring a plurality of first matching errors;
  
  (e) selecting a second set of relevant features of the three dimensional model based on the second viewpoint;
  
  (f) matching a plurality of second features of the second image to the second set of relevant features and measuring a plurality of second matching errors; and
  
  (g) updating a position estimate of the object within the three dimensional model of the scene based on the plurality of first matching errors and the plurality of second matching errors.
- View Dependent Claims (20, 21, 22, 23)
- - 20. The method of claim 19, further including the steps of:
    - comparing the first image and the second image to generate relative pose constraints; and
      
      updating the first viewpoint and the second viewpoint based on the relative pose constraints;
  - 21. The method of claim 20, wherein step of comparing the first image to the second image includes the steps of:
    - matching the plurality of first features of the first image to the plurality of second features of the second feature to create a set of matched features;
      
      calculating optical flow for each matched feature; and
      
      generating the relative pose constraints responsive to the calculated optical flow.
  - 22. The method of claim 20, wherein the step of comparing the first image to the second image includes the steps of;
    - generating a plurality of parallax measurements between the plurality of first features and the plurality of second features; and
      
      generating the relative pose constraints responsive to the plurality of parallax measurements.
  - 23. The method of claim 20, wherein step of comparing the first image to the second image includes the step of generating the relative pose constraints uses an epipolar geometry of the first image and the second image.

24. A method for refining a three dimension model of a scene containing an object using a plurality of images of the scene, each image including the object, comprising the steps of:
- (a) selecting a subset of images from the plurality of images of the scene, the subset of frames containing at least two of the images;
  
  (b) determining a plurality of approximate relative viewpoints of the subset of images;
  
  (c) comparing each image in the subset of images to the three dimensional model to generate a subset of estimated viewpoints corresponding to the subset of images, the subset of estimated viewpoints constrained by the plurality of approximate relative viewpoints;
  
  (d) selecting a set of relevant features of the three dimensional model corresponding to each estimated viewpoint;
  
  (e) matching a plurality of features of the each image in the subset of images to the corresponding set of relevant features and measuring a plurality of matching errors; and
  
  (f) updating a position estimate of the object within the three dimensional model of the scene based on the plurality of matching errors.
- View Dependent Claims (25, 26)
- - 25. The method of claim 24, further comprising the step of:
    - (g) repeating to steps (c), (d), (e), and (f) using the updated position estimate of the object until the plurality of matching errors are less than predetermined matching criteria.
  - 26. The method of claim 24, wherein the subset of images is selected to have viewpoints within a predetermined range.

27. A method for creating a hybrid three dimension model of a scene using a plurality of images of the scene, comprising the steps of:
- (a) creating a polyhedral model the scene including at least one polygonal surface;
  
  (b) determining a first set of images containing at least a section of a first polygonal surface; and
  
  (c) comparing a plurality of images selected from the first set of images to generate a local surface shape map corresponding to the first polygonal surface.
- View Dependent Claims (28, 29, 30, 31)
- - 28. The method of claim 27, wherein step (c) includes the steps of;
    - (c1) calculating optical flow in regions corresponding to the first polygonal surface of pairs of the first set of images; and
      
      (c2) generating the local surface shape map responsive to the calculated optical flow.
  - 29. The method of claim 27, wherein step (c) includes the steps of;
    - (c1) calculating parallax in regions corresponding to the first polygonal surface of pairs of the first set of images; and
      
      (c2) generating the local surface shape map responsive to the calculated parallax.
  - 30. The method of claim 27, further comprising the steps of:
    - (d) separating the local surface shape map of the first polygonal surface into a plurality of portions of not greater than the predetermined size;
      
      (e) determining a first subset of images from the first set of images, each image of the first subset of images containing a first portion of the first polygonal surface;
      
      (f) selecting at least one selected image of the first subset of images;
      
      (g) projecting a corresponding section of the at least one selected image onto the first portion of the local surface shape map of the first polygonal surface of the hybrid three dimension model as a local color map; and
      
      (h) repeating steps (e), (f), and (g) for each remaining portion of the local surface shape map of the first polygonal surface.
  - 31. The method of claim 30, further comprising the step of:
    - (i) blending the plurality of local color maps corresponding to the plurality of portions of the local surface shape map of the first polygonal surface.

32. A method for creating a textured three dimension model of a scene using a plurality of images of the scene, comprising the steps of:
- (a) creating a polyhedral model the scene including at least one polygonal surface;
  
  (b) identifying at least one portion of the one polygonal surface;
  
  (c) determining a first subset of images containing the at least one portion of the one polygonal surface;
  
  (d) selecting at least one selected image of the first subset of images;
  
  (e) projecting a corresponding section of each of the at least one selected image onto the at least one portion of the one polygonal surface of the textured three dimension model as a local color map;
  
  (f) repeating steps (c), (d), and (e) for each remaining portion of the one polygonal surface of the polyhedral model
- View Dependent Claims (33, 34, 35, 36)
- - 33. The method of claim 32, wherein step (d) includes the steps of:
    - (d1) determining a resolution level for the at least one portion of the one polygonal surface of each image of the first subset of images; and
      
      (d2) selecting a selected image of the first subset of images corresponding to a largest resolution level for the at least one portion of the one polygonal surface.
  - 34. The method of claim 32, wherein:
    - the at least one selected image contains at least two selected images; and
      
      step (e) includes the step of at least one of combining and selecting at least one color value of the at least one corresponding section of the at least two selected images to create the local color map of the at least one portion of the one polygonal surface.
  - 35. The method of claim 32, wherein the at least one portion includes a plurality of portions and the method further comprises the step of:
    - (g) blending the plurality of local color maps corresponding to the plurality of portions of the one polygonal surface.
  - 36. The method of claim 35, wherein step (g) includes the step of:
    - (g1) creating a plurality of resolution levels of the plurality of local color maps; and
      
      (g2) blending the plurality of local color maps over each resolution level.

37. A method for creating a dynamic sequence of virtual images of a scene using a dynamically updated three dimension model of the scene, comprising the steps of:
- (a) updating the three dimension model using a video sequence of images of the scene including the steps of;
  
  (a1) determining a present viewpoint of a present image of the video sequence of images;
  
  (a2) determining a relevant portion of the three dimensional model corresponding to the present image of the video sequence of images;
  
  (a3) updating the relevant portion of the three dimensional model by projecting the present image onto the relevant portion of the three dimension model;
  
  (b) selecting a first virtual viewpoint of a first virtual image of the dynamic sequence of virtual images;
  
  (c) creating the first virtual image by projecting the dynamic three dimensional model onto the first virtual viewpoint; and
  
  (d) repeating steps (b) and (c) for each remaining virtual image of the dynamic sequence of virtual images.
- View Dependent Claims (38, 39, 40)
- - 38. The method of claim 37, wherein step (a) further includes the step of;
    - (a4) repeating step (a) using a remaining image of the sequence of images as the present image;
  - 39. The method of claim 37, wherein the selection of the virtual viewpoint in step (b) is responsive to at least one of;
    - an external signal;
      
      a predetermined trajectory;
      
      a motion of a selected pattern of the dynamic three-dimensional model;
      
      a motion of a selected object of the dynamic three-dimensional model;
      
      a motion of a selected pattern of a plurality of images of the sequence of video images; and
      
      a motion of a selected object of a plurality of images of the sequence of video images.
  - 40. The method of claim 37, wherein the determination of the present viewpoint of the video sequence of images in step (a) is responsive to at least one of;
    - an external signal;
      
      a predetermined trajectory;
      
      a motion of a selected pattern of the dynamic three-dimensional model;
      
      a motion of a selected object of the dynamic three-dimensional model;
      
      a motion of a selected pattern of a plurality of images of the sequence of video images;
      
      a motion of a selected object of a plurality of images of the sequence of video images; and
      
      the virtual viewpoint selected in step (b).

41. A computer readable medium adapted to instruct a general purpose computer to update a three dimensional model of a scene using the three dimensional model of the scene, an image received from a camera having an unknown pose, the method comprising the steps of:
- (a) generating an estimate of the pose;
  
  (b) selecting a set of relevant features of the three dimensional model based on the estimate of the pose;
  
  (c) creating a virtual projection of the set of relevant features responsive to the estimate of the pose;
  
  (d) matching a plurality of features of an image received from the camera to the virtual projection of the set of relevant features and measuring a plurality of matching errors;
  
  (e) updating the estimate of the pose to reduce the plurality of matching errors; and
  
  (f) updating the three dimensional model of the scene based on data from the image and the estimate of the pose.

42. An automatic three-dimensional model updating apparatus for accurately estimating a point of view of an image of a scene, relative to a three-dimensional model of the scene, and updating the three-dimensional model comprising:
- (a) estimating means for providing an estimate of the point of view of the image;
  
  (b) relevant feature selecting means for selecting a set of relevant features of the three dimensional model based on the estimate of the point of view;
  
  (c) virtual projection means for creating a virtual projection of the set of relevant features responsive to the estimate of the point of view;
  
  (d) matching means for matching a plurality of features of the image to the virtual projection of the set of relevant features;
  
  (e) measurement means for measuring a plurality of is matching errors;
  
  (f) point of view refinement means for updating the estimate of the point of view to reduce the plurality of matching errors; and
  
  (g) model refinement means, responsive to the estimated point of view and to the image, for updating the three-dimensional model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
SRI International, Inc.
Original Assignee
Sarnoff Corporation (SRI International, Inc.)
Inventors
Samarasekera, Supun, Hsu, Steve, Kumar, Rakesh, Sawhney, Harpreet Singh

Granted Patent

US 6,985,620 B2
Time in Patent Office

Days
Field of Search
US Class Current

382/154
CPC Class Codes

G01S 5/163   Determination of attitude u...

G06T 15/20   Perspective computation

G06T 17/10   Constructive solid geometry...

G06T 2200/08   involving all processing st...

G06T 2207/30244   Camera pose

G06T 7/74   involving reference images ...

Method of pose estimation and model refinement for video representation of a three dimensional scene

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

147 Citations

42 Claims

Specification

Solutions

Use Cases

Quick Links

Method of pose estimation and model refinement for video representation of a three dimensional scene

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

147 Citations

42 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links