Camera/object pose from predicted coordinates
First Claim
1. A method of calculating pose of an entity comprising:
- receiving, at a processor, at least one image where the image is of a scene captured by an entity comprising a mobile camera;
applying image elements of the at least one image to a trained machine learning system to obtain a plurality of associations between image elements and three-dimensional (3D) points in a scene space, the trained machine learning system optimizing an energy function comprising the 3D points in the scene space predicted by at least one tree in at least one random decision forest and 3D coordinates in camera space;
determining whether a pose of the entity has been calculated;
based on a determination that the pose has been calculated, refining the pose of the entity from the plurality of associations and the optimized function; and
based on a determination that the pose of the entity has not been calculated, calculating an initial pose of the entity from the plurality of associations and the optimized function; and
generating map display data based at least in part on the initial pose of the entity,wherein the energy function comprises;
E(H)=Σ
iϵ
1ρ
(minmϵ
Mi∥
m−
Hxi∥
2)wherein id is an index of the image elements, ρ
is an error function, mϵ
Mi represents the predicted 3D points in the scene space, xi are the 3D coordinates in the camera space, and H is the pose of the entity.
2 Assignments
0 Petitions
Accused Products
Abstract
Camera or object pose calculation is described, for example, to relocalize a mobile camera (such as on a smart phone) in a known environment or to compute the pose of an object moving relative to a fixed camera. The pose information is useful for robotics, augmented reality, navigation and other applications. In various embodiments where camera pose is calculated, a trained machine learning system associates image elements from an image of a scene, with points in the scene'"'"'s 3D world coordinate frame. In examples where the camera is fixed and the pose of an object is to be calculated, the trained machine learning system associates image elements from an image of the object with points in an object coordinate frame. In examples, the image elements may be noisy and incomplete and a pose inference engine calculates an accurate estimate of the pose.
-
Citations
20 Claims
-
1. A method of calculating pose of an entity comprising:
-
receiving, at a processor, at least one image where the image is of a scene captured by an entity comprising a mobile camera; applying image elements of the at least one image to a trained machine learning system to obtain a plurality of associations between image elements and three-dimensional (3D) points in a scene space, the trained machine learning system optimizing an energy function comprising the 3D points in the scene space predicted by at least one tree in at least one random decision forest and 3D coordinates in camera space; determining whether a pose of the entity has been calculated; based on a determination that the pose has been calculated, refining the pose of the entity from the plurality of associations and the optimized function; and based on a determination that the pose of the entity has not been calculated, calculating an initial pose of the entity from the plurality of associations and the optimized function; and generating map display data based at least in part on the initial pose of the entity, wherein the energy function comprises;
E(H)=Σ
iϵ
1ρ
(minmϵ
Mi ∥
m−
Hxi∥
2)wherein id is an index of the image elements, ρ
is an error function, mϵ
Mi represents the predicted 3D points in the scene space, xi are the 3D coordinates in the camera space, and H is the pose of the entity. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 14, 17, 18, 19, 20)
-
-
10. A pose tracker comprising:
-
a processor arranged to; receive at least one image of a scene captured by an entity comprising a mobile camera; and apply image elements of the at least one image to a trained machine learning system to obtain a plurality of associations between image elements and three-dimensional (3D) points in a scene space; and a pose inference engine arranged to; optimize an energy function comprising the 3D points in the scene space predicted by at least one tree in at least one random decision forest and 3D coordinates in camera space; determine whether a pose of the entity has been calculated; based on a determination that the pose has been calculated, refining the pose of the entity from the plurality of associations and the optimized function; and based on a determination that the pose of the entity has not been calculated, calculate an initial pose of the mobile camera from the plurality of associations, the calculation being based at least in part on the optimized function; wherein the energy function comprises;
E(H)=Σ
iϵ
1ρ
(minmϵ
Mi ∥
m−
Hxi∥
2)wherein iϵ
I is an index of the image elements, ρ
is an error function, mϵ
Mi represents the predicted 3D points in the scene space, xi are the 3D coordinates in the camera space, and H is the pose of the entity. - View Dependent Claims (11, 12, 13)
-
-
15. One or more computer-readable storage devices having computer-executable instructions that when executed by a processor, cause the processor to:
-
receive at least one image that is of a scene captured by an entity comprising a mobile camera; apply image elements of the at least one image to a trained machine learning system to obtain a plurality of associations between a set of image elements and three dimensional (3D) points in a scene space, the trained machine learning system optimizing an energy function comprising the 3D points in the scene space predicted by at least one tree in at least one random decision forest and 3D coordinates in camera space; determine whether a pose of the entity has been calculated; based on a determination that the pose has been calculated, refine the pose of the entity from the plurality of associations and the optimized function; based on a determination that the pose of the entity has not been calculated, calculate an initial pose of the entity from the plurality of associations and the optimized function; and generate map display data based at least in part on the initial pose of the entity; wherein the energy function comprises;
E(H)=Σ
iϵ
1ρ
(minmϵ
Mi ∥
m−
Hxi∥
2)wherein iϵ
I is an index of the image elements, ρ
is an error function, mϵ
Mi represents the predicted 3D points in the scene space, xi are the 3D coordinates in the camera space, and H is the pose of the entity. - View Dependent Claims (16)
-
Specification