MONOCULAR 3D POSE ESTIMATION AND TRACKING BY DETECTION

US 20130142390A1
Filed: 06/14/2011
Published: 06/06/2013
Est. Priority Date: 06/12/2010
Status: Active Grant

First Claim

Patent Images

1-15. -15. (canceled)

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and apparatus are described for monocular 3D human pose estimation and tracking, which are able to recover poses of people in realistic street conditions captured using a monocular, potentially moving camera. Embodiments of the present invention provide a three-stage process involving estimating (10, 60, 110) a 3D pose of each of the multiple objects using an output of 2D tracking-by detection (50) and 2D viewpoint estimation (46). The present invention provides a sound Bayesian formulation to address the above problems. The present invention can provide articulated 3D tracking in realistic street conditions.

The present invention provides methods and apparatus for people detection and 2D pose estimation combined with a dynamic motion prior. The present invention provides not only 2D pose estimation for people in side views, it goes beyond this by estimating poses in 3D from multiple viewpoints. The estimation of poses is done in monocular images, and does not require stereo images. Also the present invention does not require detection of characteristic poses of people.

38 Citations

View as Search Results

29 Claims

1-15. -15. (canceled)

16. An image processor (10) for detection and tracking of a 3D pose of each of multiple objects in a sequence of monocular images, the 3D pose representing at least a 3D configuration of movable parts of the object, the image processor comprising:
- one or more 2D pose detectors (44) for estimating a pose of each of the multiple objects in an image;
  
  a 2D tracking and viewpoint estimation computation part (50) for receiving the outputs of the 2D pose detectors and being adapted to apply 2D tracking by detection, the 2D tracking exploiting temporal coherency; and
  
  a 3D pose estimation computation part (60) for estimating and tracking the 3D poses of the multiple objects in the sequence of images from the output of the 2D tracking and viewpoint estimation computation part;
  
  being characterized in that the image processor further comprisesa 2D viewpoint detector (46) for estimating a viewpoint of each of the multiple objects in the image; and
  
  the 2D tracking and viewpoint estimation computation part (50) further receiving the outputs of the 2D viewpoint detector and being adapted to at least improve the outputs of the 2D viewpoint detector;
  
  the 2D tracking and viewpoint estimation computation part (50) using the 2D tracking by detection for viewpoint tracking;
  
  and the 3D pose estimation computation part being adapted to lift 2D poses to recover 3D poses images by relying on the output of the 2D tracking and viewpoint estimation computation part.
- View Dependent Claims (17, 18, 19, 20, 21)
- - 17. The image processor of claim 16, further comprising one or more part based detectors (42) for detecting parts of the multiple objects for supply to the 2D pose detector.
  - 18. The image processor of claim 17, wherein the one or more part based detectors make use of a pictorial structure model of the object and/or wherein the one or more part based detectors are viewpoint specific detectors.
  - 19. The image processor of claim 17 further comprising an SVM detector, the output of the one or more part based detectors being fed to the SVM detector, or further comprising a classifier (48), the output of the one or more part based detectors being fed to the classifier.
  - 20. The image processor of claim 16, the 2D tracking and viewpoint estimation computation part comprising a tracklet extractor (52).
  - 21. The image processor of claim 20, further comprising a viewpoint estimator for estimating a sequence of viewpoints of each tracklet obtained from the tracklet extractor.

22. A method of using an image processor for detection of a 3D pose of each of multiple objects in a sequence of monocular images, the 3D pose representing at least a 3D configuration of movable parts of the object, the method having the steps of:
- estimating (104) a 2D pose of each of the multiple objects in an image;
  
  applying 2D tracking-by-detection (109) to the estimated 2D pose detector, the 2D tracking exploiting temporal coherency; and
  
  estimating (110) a 3D pose of each of the multiple objects using an output of the 2D tracking-by detection and viewpoint estimation, the estimating being adapted to lift 2D poses to recover 3D poses images by relying on the output of the 2D tracking and viewpoint estimation computation part;
  
  the method further being characterized in that it comprises a step of;
  
  estimating (105) a 2D viewpoint of each of the multiple objects in the image; and
  
  in that the 2D tracking-by-detection (109) is applied to the estimated 2D viewpoint to at least improve the estimated 2D viewpoint.
- View Dependent Claims (23, 24, 25, 26, 27, 28, 29)
- - 23. The method of claim 22, wherein estimating the 2D pose comprises detecting parts of each of the multiple objects in the image.
  - 24. The method of claim 22, wherein detecting parts of the multiple objects makes use of a pictorial structure model of each of the multiple objects and/or wherein detecting parts of the multiple objects is viewpoint specific.
  - 25. The method of claim 22, wherein the part based detection step is followed by a classification step.
  - 26. The method of claim 22, wherein estimation of the 2D tracking and viewpoint comprises extracting (108) tracklets from the image.
  - 27. The method of claim 26, further comprising estimating a viewpoint of each tracklet.
  - 28. The method of claim 22, wherein the 3D pose estimation comprises lifting (112) 2D poses to 3D poses.
  - 29. A program on a computer readable medium and having instructions which when executed by a computer cause the computer to carry out the method of claim 22.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Technische Universitã¤T Darmstadt, Toyota Jidosha Kabushiki Kaisha (Toyota Motor Corporation)
Original Assignee
Technische Universitã¤T Darmstadt, Toyota Motor Europe SA (Toyota Motor Corporation)
Inventors
Othmezouri, Gabriel, Sakata, Ichiro, Roth, Stefan, Schiele, Bernt, Andriluka, Mykhaylo

Granted Patent

US 8,958,600 B2
Time in Patent Office

Days
Field of Search
US Class Current

382/103
CPC Class Codes

G06T 2207/10016   Video; Image sequence

G06T 2207/30196   Human being; Person

G06T 2207/30232   Surveillance

G06T 2207/30244   Camera pose

G06T 7/251   involving models

G06V 20/64   Three-dimensional objects

G06V 40/10   Human or animal bodies, e.g...

G06V 40/103   Static body considered as a...

MONOCULAR 3D POSE ESTIMATION AND TRACKING BY DETECTION

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

38 Citations

29 Claims

Specification

Solutions

Use Cases

Quick Links

MONOCULAR 3D POSE ESTIMATION AND TRACKING BY DETECTION

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

38 Citations

29 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links