Camera/object pose from predicted coordinates

US 9,940,553 B2
Filed: 02/22/2013
Issued: 04/10/2018
Est. Priority Date: 02/22/2013
Status: Active Grant

First Claim

Patent Images

1. A method of calculating pose of an entity comprising:

receiving, at a processor, at least one image where the image is of a scene captured by an entity comprising a mobile camera;

applying image elements of the at least one image to a trained machine learning system to obtain a plurality of associations between image elements and three-dimensional (3D) points in a scene space, the trained machine learning system optimizing an energy function comprising the 3D points in the scene space predicted by at least one tree in at least one random decision forest and 3D coordinates in camera space;

determining whether a pose of the entity has been calculated;

based on a determination that the pose has been calculated, refining the pose of the entity from the plurality of associations and the optimized function; and

based on a determination that the pose of the entity has not been calculated, calculating an initial pose of the entity from the plurality of associations and the optimized function; and

generating map display data based at least in part on the initial pose of the entity,wherein the energy function comprises;

E(H)=Σ

_iϵ

1ρ

(min_mϵ

M_i∥

m−

Hx_i∥

₂)wherein id is an index of the image elements, ρ

is an error function, mϵ

M_irepresents the predicted 3D points in the scene space, x_iare the 3D coordinates in the camera space, and H is the pose of the entity.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Camera or object pose calculation is described, for example, to relocalize a mobile camera (such as on a smart phone) in a known environment or to compute the pose of an object moving relative to a fixed camera. The pose information is useful for robotics, augmented reality, navigation and other applications. In various embodiments where camera pose is calculated, a trained machine learning system associates image elements from an image of a scene, with points in the scene'"'"'s 3D world coordinate frame. In examples where the camera is fixed and the pose of an object is to be calculated, the trained machine learning system associates image elements from an image of the object with points in an object coordinate frame. In examples, the image elements may be noisy and incomplete and a pose inference engine calculates an accurate estimate of the pose.

Citations

20 Claims

1. A method of calculating pose of an entity comprising:
- receiving, at a processor, at least one image where the image is of a scene captured by an entity comprising a mobile camera;
  
  applying image elements of the at least one image to a trained machine learning system to obtain a plurality of associations between image elements and three-dimensional (3D) points in a scene space, the trained machine learning system optimizing an energy function comprising the 3D points in the scene space predicted by at least one tree in at least one random decision forest and 3D coordinates in camera space;
  
  determining whether a pose of the entity has been calculated;
  
  based on a determination that the pose has been calculated, refining the pose of the entity from the plurality of associations and the optimized function; and
  
  based on a determination that the pose of the entity has not been calculated, calculating an initial pose of the entity from the plurality of associations and the optimized function; and
  
  generating map display data based at least in part on the initial pose of the entity,wherein the energy function comprises;
  
  E(H)=Σ
  
  _iϵ
  
  1ρ
  
  (min_mϵ
  
  M_i∥
  
  m−
  
  Hx_i∥
  
  ₂)wherein id is an index of the image elements, ρ
  
  is an error function, mϵ
  
  M_irepresents the predicted 3D points in the scene space, x_iare the 3D coordinates in the camera space, and H is the pose of the entity.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 14, 17, 18, 19, 20)
- - 2. A method as claimed in claim 1, further comprising calculating the initial pose of the entity as parameters having six degrees of freedom, three indicating rotation of the entity and three indicating position of the entity.
  - 3. A method as claimed in claim 1, the machine learning system having been trained using images with image elements labeled with scene coordinates.
  - 4. A method as claimed in claim 1, wherein the machine learning system comprises a plurality of trained random forests and the method further comprises:
    - applying the image elements of the at least one image to the plurality of trained random forests, the trained random forests having been trained using images from a different one of a plurality of scenes; and
      
      calculating which of the scenes the mobile camera was in when the at least one image was captured.
  - 5. A method as claimed in claim 1, wherein the machine learning system is trained using images of a plurality of scenes with image elements labeled with scene identifiers and labeled with scene coordinates of points in the scene the image elements depict.
  - 6. A method as claimed in claim 1, further comprising calculating the pose by searching amongst a set of possible pose candidates and using samples of the plurality of associations between image elements and points to assess the set of possible pose candidates.
  - 7. A method as claimed in claim 1, further comprising receiving at the processor, a stream of images, and calculating the pose by searching amongst a set of possible pose candidates which includes a second pose calculated from another image in the stream.
  - 8. A method as claimed in claim 1 at least partially carried out using hardware logic selected from one or more of the following:
    - a field-programmable gate array, a program-specific integrated circuit, a program-specific standard product, a system-on-a-chip, a complex programmable logic device, and a graphics processing unit.
  - 9. A method as claimed in claim 1, wherein the entity is a mobile camera and the pose of the mobile camera is calculated, the method further comprising accessing a 3D model of the scene and refining the pose of the mobile camera using the accessed 3D model.
  - 14. The method as claimed in claim 1, further comprising prior to applying the image elements, removing a set of image elements that are spurious or noisy image elements.
  - 17. The method as claimed in claim 1, further comprising improving an accuracy of the calculated pose by enforcing a minimum distance separation between the image elements.
  - 18. The method as claimed in claim 1, further comprising enabling a downstream system to use the initial pose of the entity to determine an updated pose of the entity or to use the initial pose of the entity in one or more other applications by providing the initial pose of the entity to the downstream system.
  - 19. The method as claimed in claim 1, further comprising:
    - setting a threshold for refining the pose of the entity from the plurality of associations and the optimized function; and
      
      stopping the refinement once the threshold has been reached.
  - 20. The method as claimed in claim 7, further comprising:
    - sampling the set of possible pose candidates for noise or missing values; and
      
      based on the sampling, determining whether a pose candidate is an inlier or outlier.

10. A pose tracker comprising:
- a processor arranged to;
  
  receive at least one image of a scene captured by an entity comprising a mobile camera; and
  
  apply image elements of the at least one image to a trained machine learning system to obtain a plurality of associations between image elements and three-dimensional (3D) points in a scene space; and
  
  a pose inference engine arranged to;
  
  optimize an energy function comprising the 3D points in the scene space predicted by at least one tree in at least one random decision forest and 3D coordinates in camera space;
  
  determine whether a pose of the entity has been calculated;
  
  based on a determination that the pose has been calculated, refining the pose of the entity from the plurality of associations and the optimized function; and
  
  based on a determination that the pose of the entity has not been calculated, calculate an initial pose of the mobile camera from the plurality of associations, the calculation being based at least in part on the optimized function;
  
  wherein the energy function comprises;
  
  E(H)=Σ
  
  _iϵ
  
  1ρ
  
  (min_mϵ
  
  M_i∥
  
  m−
  
  Hx_i∥
  
  ₂)wherein iϵ
  
  I is an index of the image elements, ρ
  
  is an error function, mϵ
  
  M_irepresents the predicted 3D points in the scene space, x_iare the 3D coordinates in the camera space, and H is the pose of the entity.
- View Dependent Claims (11, 12, 13)
- - 11. The pose tracker as claimed in claim 10, the pose inference engine further arranged to calculate the initial pose by searching amongst a set of possible pose candidates and using samples of the plurality of associations between image elements and points in scene coordinates to assess the set of possible pose candidates.
  - 12. The pose tracker as claimed in claim 10, the processor further arranged to receive a stream of images, and the pose tracker further comprising a pose inference engine arranged to calculate the initial pose by searching amongst a set of possible pose candidates which includes a second pose calculated from another image in the stream of images.
  - 13. The pose tracker as claimed in claim 10 at least partially implemented using hardware logic selected from one or more of the following:
    - a field-programmable gate array, a program-specific integrated circuit, a program-specific standard product, a system-on-a-chip, a complex programmable logic device, and a graphics processing unit.

15. One or more computer-readable storage devices having computer-executable instructions that when executed by a processor, cause the processor to:
- receive at least one image that is of a scene captured by an entity comprising a mobile camera;
  
  apply image elements of the at least one image to a trained machine learning system to obtain a plurality of associations between a set of image elements and three dimensional (3D) points in a scene space, the trained machine learning system optimizing an energy function comprising the 3D points in the scene space predicted by at least one tree in at least one random decision forest and 3D coordinates in camera space;
  
  determine whether a pose of the entity has been calculated;
  
  based on a determination that the pose has been calculated, refine the pose of the entity from the plurality of associations and the optimized function;
  
  based on a determination that the pose of the entity has not been calculated, calculate an initial pose of the entity from the plurality of associations and the optimized function; and
  
  generate map display data based at least in part on the initial pose of the entity;
  
  wherein the energy function comprises;
  
  E(H)=Σ
  
  _iϵ
  
  1ρ
  
  (min_mϵ
  
  M_i∥
  
  m−
  
  Hx_i∥
  
  ₂)wherein iϵ
  
  I is an index of the image elements, ρ
  
  is an error function, mϵ
  
  M_irepresents the predicted 3D points in the scene space, x_iare the 3D coordinates in the camera space, and H is the pose of the entity.
- View Dependent Claims (16)
- - 16. The one or more computer-readable storage devices of claim 15, wherein applying the set of image elements comprises applying at least three image elements.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Shotton, Jamie Daniel Joseph, Glocker, Benjamin Michael, Zach, Christopher, Izadi, Shahram, Criminisi, Antonio, Fitzgibbon, Andrew William
Primary Examiner(s)
CONNER, SEAN M

Application Number

US13/774,145
Publication Number

US 20140241617A1
Time in Patent Office

1,873 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 18/214   Generating training pattern...

G06F 18/231   Hierarchical techniques, i....

G06F 18/24323   Tree-organised classifiers

G06V 10/7625   Hierarchical techniques, i....

G06V 10/764   using classification, e.g. ...

G06V 10/774   Generating sets of training...

G06V 20/20   in augmented reality scenes

Camera/object pose from predicted coordinates

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Camera/object pose from predicted coordinates

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links