MONOCULAR 3D POSE ESTIMATION AND TRACKING BY DETECTION
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and apparatus are described for monocular 3D human pose estimation and tracking, which are able to recover poses of people in realistic street conditions captured using a monocular, potentially moving camera. Embodiments of the present invention provide a three-stage process involving estimating (10, 60, 110) a 3D pose of each of the multiple objects using an output of 2D tracking-by detection (50) and 2D viewpoint estimation (46). The present invention provides a sound Bayesian formulation to address the above problems. The present invention can provide articulated 3D tracking in realistic street conditions.
The present invention provides methods and apparatus for people detection and 2D pose estimation combined with a dynamic motion prior. The present invention provides not only 2D pose estimation for people in side views, it goes beyond this by estimating poses in 3D from multiple viewpoints. The estimation of poses is done in monocular images, and does not require stereo images. Also the present invention does not require detection of characteristic poses of people.
38 Citations
29 Claims
-
1-15. -15. (canceled)
-
16. An image processor (10) for detection and tracking of a 3D pose of each of multiple objects in a sequence of monocular images, the 3D pose representing at least a 3D configuration of movable parts of the object, the image processor comprising:
-
one or more 2D pose detectors (44) for estimating a pose of each of the multiple objects in an image; a 2D tracking and viewpoint estimation computation part (50) for receiving the outputs of the 2D pose detectors and being adapted to apply 2D tracking by detection, the 2D tracking exploiting temporal coherency; and a 3D pose estimation computation part (60) for estimating and tracking the 3D poses of the multiple objects in the sequence of images from the output of the 2D tracking and viewpoint estimation computation part; being characterized in that the image processor further comprises a 2D viewpoint detector (46) for estimating a viewpoint of each of the multiple objects in the image; and the 2D tracking and viewpoint estimation computation part (50) further receiving the outputs of the 2D viewpoint detector and being adapted to at least improve the outputs of the 2D viewpoint detector;
the 2D tracking and viewpoint estimation computation part (50) using the 2D tracking by detection for viewpoint tracking;and the 3D pose estimation computation part being adapted to lift 2D poses to recover 3D poses images by relying on the output of the 2D tracking and viewpoint estimation computation part. - View Dependent Claims (17, 18, 19, 20, 21)
-
-
22. A method of using an image processor for detection of a 3D pose of each of multiple objects in a sequence of monocular images, the 3D pose representing at least a 3D configuration of movable parts of the object, the method having the steps of:
-
estimating (104) a 2D pose of each of the multiple objects in an image; applying 2D tracking-by-detection (109) to the estimated 2D pose detector, the 2D tracking exploiting temporal coherency; and estimating (110) a 3D pose of each of the multiple objects using an output of the 2D tracking-by detection and viewpoint estimation, the estimating being adapted to lift 2D poses to recover 3D poses images by relying on the output of the 2D tracking and viewpoint estimation computation part; the method further being characterized in that it comprises a step of; estimating (105) a 2D viewpoint of each of the multiple objects in the image; and
in that the 2D tracking-by-detection (109) is applied to the estimated 2D viewpoint to at least improve the estimated 2D viewpoint.- View Dependent Claims (23, 24, 25, 26, 27, 28, 29)
-
Specification