METHOD FOR ESTIMATING A POSE OF AN ARTICULATED OBJECT MODEL

US 20110267344A1
Filed: 04/28/2011
Published: 11/03/2011
Est. Priority Date: 04/30/2010
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for estimating a pose of an articulated object model (4), wherein the articulated object model (4) is a computer based 3D model (1) of a real world object (14) observed by one or more source cameras (9), and the articulated object model (4) represents a plurality of joints (2) and of links (3) that link the joints (2), and wherein the pose of the articulated object model (4) is defined by the spatial location of the joints (2), the method comprising the steps ofobtaining at least one source image (10) from a video stream comprising a view of the real world object (14) recorded by a source camera (9);

processing the at least one source image (10) to extract a corresponding source image segment (13) comprising the view of the real world object (14) separated from the image background;

maintaining, in a database in computer readable form, a set of reference silhouettes, each reference silhouette being associated with an articulated object model (4) and with a particular reference pose of this articulated object model (4);

comparing the at least one source image segment (13) to the reference silhouettes and selecting a predetermined number of reference silhouettes by taking into account, for each reference silhouette,a matching error that indicates how closely the reference silhouette matches the source image segment (13) and/ora coherence error that indicates how much the reference pose is consistent with the pose of the same real world object (14) as estimated from at least one of preceding and following source images (10) of the video stream;

retrieving the reference poses of the articulated object models (4) associated with the selected of reference silhouettes; and

computing an estimate of the pose of the articulated object model (4) from the reference poses of the selected reference silhouettes.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer-implemented method for estimating a pose of an articulated object model (4), wherein the articulated object model (4) is a computer based 3D model (1) of a real world object (14) observed by one or more source cameras (9), and wherein the pose of the articulated object model (4) is defined by the spatial location of joints (2) of the articulated object model (4), comprises the steps of

- obtaining a source image (10) from a video stream;
- processing the source image (10) to extract a source image segment (13);
- maintaining, in a database, a set of reference silhouettes, each being associated with an articulated object model (4) and a corresponding reference pose;
- comparing the source image segment (13) to the reference silhouettes and selecting reference silhouettes by taking into account, for each reference silhouette,
  - a matching error that indicates how closely the reference silhouette matches the source image segment (13) and/or
  - a coherence error that indicates how much the reference pose is consistent with the pose of the same real world object (14) as estimated from a preceding source image (10);
- retrieving the corresponding reference poses of the articulated object models (4); and
- computing an estimate of the pose of the articulated object model (4) from the reference poses of the selected reference silhouettes.

115 Citations

15 Claims

1. A computer-implemented method for estimating a pose of an articulated object model (4), wherein the articulated object model (4) is a computer based 3D model (1) of a real world object (14) observed by one or more source cameras (9), and the articulated object model (4) represents a plurality of joints (2) and of links (3) that link the joints (2), and wherein the pose of the articulated object model (4) is defined by the spatial location of the joints (2), the method comprising the steps ofobtaining at least one source image (10) from a video stream comprising a view of the real world object (14) recorded by a source camera (9);
- processing the at least one source image (10) to extract a corresponding source image segment (13) comprising the view of the real world object (14) separated from the image background;
  
  maintaining, in a database in computer readable form, a set of reference silhouettes, each reference silhouette being associated with an articulated object model (4) and with a particular reference pose of this articulated object model (4);
  
  comparing the at least one source image segment (13) to the reference silhouettes and selecting a predetermined number of reference silhouettes by taking into account, for each reference silhouette,a matching error that indicates how closely the reference silhouette matches the source image segment (13) and/ora coherence error that indicates how much the reference pose is consistent with the pose of the same real world object (14) as estimated from at least one of preceding and following source images (10) of the video stream;
  
  retrieving the reference poses of the articulated object models (4) associated with the selected of reference silhouettes; and
  
  computing an estimate of the pose of the articulated object model (4) from the reference poses of the selected reference silhouettes.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15)
- - 2. The method of claim 1, wherein the step of comparing the at least one source image segment (13) to the reference silhouettes comprises the steps of for each reference silhouette that the source image segment (13) is compared to:
    - determining a projective transform that maps the source image segment (13) onto the reference silhouette, in particular by scaling the source image segment (13) to be of the same size as the reference silhouette; and
      
      computing the matching error as either being proportional to the relative size of the image area in which the mapped source image segment (13) and the reference silhouette do not overlap, or as being a measure of a distance between the outlines of the scaled source image segment (13) and of the reference silhouette, with the matching error optionally also being dependent on parameters of the projective transform;
      
      and using this matching error as a measure for how closely the source image segment (13) and the reference silhouette match.
  - 3. The method of claim 2, whereinthe step of scaling is accomplished by re-sampling either the source image segment (13) or the reference silhouettes or both to have bounding boxes of the same pixel size, andboth the source image segment (13) and the reference silhouette are represented by binary images having the same pixel dimension, and computing the error value by counting the number of corresponding pixels from the source image segment (13) and the reference silhouette that differ in value.
  - 4. The method of one of the preceding claims, wherein the step of comparing the at least one source image segment (13) to the reference silhouettes comprises the steps of, for each reference silhouette that the source image segment (13) is compared to:
    - retrieving the pose of the same real world object (14) estimated from a preceding source image (10) of the video stream;
      
      computing the coherence error as being proportional to the difference between this preceding pose estimate and the reference pose of the reference silhouette, and using this coherence error as a measure of consistency with the preceding source image (10).
  - 5. The method of one of the preceding claims, wherein the step of computing an estimate of the pose of the articulated object model (4) from the reference poses of the selected reference silhouettes comprisesrepeating the preceding steps for one or more further source images (10) from one or more further video streams from further source cameras (9), each further source image (10) comprising a view of the same real world object (14) having been recorded at the same time but from a different viewpoint, thereby obtaining for each further source image (10) a predetermined number of selected reference silhouettes and associated selected reference poses;
    - performing an optimization to select for each source image (10) a most plausible reference pose, by computing for each combination of selected reference poses for the different source images (10) a total joint consistency measure by,projecting the joints (2) of the selected reference poses of this combination into 3D space, estimating a 3D position of the joints (2) and computing, for each joint, a joint consistency measure that expresses how well the estimated 3D joint position matches the projection of the joint (2) from the selected reference poses;
      
      combining the joint consistency measures of all joints to obtain the total joint consistency measure;
      
      selecting the combination of reference poses for the different source images (10) that optimizes the total joint consistency measure.
  - 6. The method of claim 5, wherein the step of performing an optimization further comprises the step of varying and optimizing a 2D offset of each silhouette in the plane of its associated source image (10) in order to correct for source camera (9) calibration errors.
  - 7. The method of one of the preceding claims, comprising the further step of displaying, on a display device, at least one source image (10) with estimated joint positions superimposed over the source image (10) and accepting a user input for interactively modifying one or more joint positions.
  - 8. A computer-implemented method for estimating a pose of an articulated object model (4), preferably according to one of the preceding claims, wherein, in order to determine a 3D pose matching a given 2D pose associated with a source image (10), the following steps are performed:
    - computing, from the given 2D pose an approximate 3D pose comprising approximate joint (2) positions which approximately match the positions of the corresponding joints (2) of the 2D pose when projected into the image plane of the source image (10) associated with the 2D pose;
      
      modifying the approximate 3D pose to exactly match the 2D pose by, for each joint (2), moving the position of the joint (2) from the approximate joint (2) position to a position defined by the intersection of a ray passing from the camera through the joint (2) position in the source image (10) with a plane parallel to the image plane of the source image (10) and passing through the approximate joint (2) position.
  - 9. A computer-implemented method for estimating a pose of an articulated object model (4), preferably according to one of the preceding claims, wherein the articulated object model (4) is a computer based 3D model (1) of a real world object (14) observed by two or more source cameras (9), and the articulated object model (4) represents a plurality of joints (2) and of links (3) that link the joints (2), and wherein the pose of the articulated object model (4) is defined by the spatial location of the joints (2), called 3D joint positions, the method comprising the steps ofdetermining an initial estimate of the 3D pose, that is, the 3D joint positions of the articulated object model (4);
    - associating each link (3) with one or more projection surfaces (5), wherein the projection surfaces (5) are surfaces defined in the 3D model, and the position and orientation of each projection surface (5) is determined by the position and orientation of the associated link (3);
      
      iteratively adapting the 3D joint positions by, for each joint (2),computing a position score assigned to its 3D joint position, the position score being a measure of the degree to which image segments from the different source cameras (9), when projected onto the projection surfaces (5) of links (3) adjacent to the joint (2), are consistent which each other;
      
      varying the 3D joint position of the joint (2) until an optimal position score is achieved;
      
      repeating the step of iteratively adapting the 3D joint positions for all joints (2) for a predetermined number of times or until the position scores converge.
  - 10. The method of claim 9, wherein the step of varying the 3D joint position of the joints (2) varies the 3D joint positions subject to anthropometric constraints, the anthropometric constraints being at least one of:
    - the joint is on or above the ground;
      
      lengths of topologically symmetric links do not differ more than 10%;
      
      the lengths of links are within anthropometric standards;
      
      distances between joints not connected by a link are within anthropometric standards.
  - 11. The method of claim 9 or 10, wherein the projection surfaces (5), for each link (3), comprise a fan (7) of billboards (6), each billboard (6) being associated with a source camera (9), and each billboard being a planar surface spanned by its associated link (3) and a vector that is normal to both this link (3) and to a line connecting a point of the link (3) to the source camera (9).
  - 12. The method of claim 9 or 10 or 11, wherein the position score of a 3D joint position of a joint (2) is computed by the steps of, for each link (3) adjacent to the joint (2),projecting the images from the different source cameras (9) onto the associated projection surfaces (5) of the link (3) and from there into a virtual image (12) as seen by a virtual camera (11);
    - for an area (12) that correspond to the projection of these projection surfaces (5) into the virtual image (12), computing a partial position score for this link according to the degree to which the image segments from the different source cameras (9) overlap and have a similar colour;
      
      combining the partial position scores to obtain the position score.
  - 13. The method of claim 12, wherein computing and combining the partial position score comprises the steps ofcomputing the partial position score for each pair of source cameras (9) contributing to the virtual image (12);
    - combining these partial position scores by adding them, weighing each partial position score according to the angle between the viewing directions of the associated pair of source cameras (9).
  - 15. A computer-implemented method for determining a segmentation of a source image segment (13), preferably in combination with the method of one of the preceding claims, the method comprising the steps ofobtaining at least one source image (10) from a video stream comprising a view of a real world object (14) recorded by a source camera (9);
    - processing the at least one source image (10) to extract a corresponding source image segment (13) comprising the view of the real world object (14) separated from the image background;
      
      maintaining, in a database in computer readable form, a set of reference silhouettes, each reference silhouette being associated with a reference segmentation, the reference segmentation defining sub-segments of the reference silhouette, each sub-segment being assigned a unique label;
      
      determining a matching reference silhouette which most closely resembles the source image segment (13) and retrieving the reference segmentation of the reference silhouette;
      
      for each sub-segment, overlaying both a thickened and a thinned version of the sub-segment over the source image segment (13) and labelling the source image pixels which lie within both the thickened and the thinned version with the label of the sub-segment;
      
      labelling all remaining pixels of the source image segment (13) as unconfident;
      
      for each sub-segment, determining a colour model that is representative of the colour of the pixels labelled with the sub-segment'"'"'s label;
      
      labelling the unconfident pixels according to the colour model, by assigning each unconfident pixel to a sub-segment whose colour model most closely fits the colour of the unconfident pixel.

14. A computer-implemented method for rendering a virtual image (12) as seen from a virtual camera (11), given an articulated object model (4), wherein the articulated object model (4) is a computer based 3D model (1) of a real world object (14) observed by two or more source cameras (9), and the articulated object model (4) represents a plurality of joints (2) and of links (3) that link the joints (2), and wherein the pose of the articulated object model (4) is defined by the spatial location of the joints (2), the method comprising the steps ofdetermining an estimate of the 3D pose, that is, the 3D joint positions of the articulated object model (4);
- associating each link (3) with one or more projection surfaces (5), wherein the projection surfaces (5) are surfaces defined in the 3D model, and the position and orientation of each projection surface (5) is determined by the position and orientation of the associated link (3);
  
  wherein the projection surfaces (5), for each link (3), comprise a fan (7) of billboards (6), each billboard (6) being associated with a source camera (9), and each billboard being a planar surface spanned by its associated link (3) and a vector that is normal to both this link (3) and to a line connecting a point of the link (3) to the source camera (9);
  
  for each source camera (9), projecting segments of the associated source image (10) onto the associated billboard (6), creating billboard images;
  
  for each link (3), projecting the billboard images into the virtual image (12) and blending the billboard images to form a corresponding part of the virtual image (12).

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Vizrt AG (Vizrt Group Holding AS)
Original Assignee
LiberoVision AG (Vizrt Group Holding AS)
Inventors
Keiser, Richard, Niederberger, Christoph, Wuermlin Stadler, Stephan, Germann, Marcel, Gross, Marcus, Hornung, Alexander, Ziegler, Remo

Granted Patent

US 8,830,236 B2
Time in Patent Office

Days
Field of Search
US Class Current

345/420
CPC Class Codes

G06T 15/205   Image-based rendering

G06T 2207/10021   Stereoscopic video; Stereos...

G06T 2207/30196   Human being; Person

G06T 7/74   involving reference images ...

G06V 20/64   Three-dimensional objects

METHOD FOR ESTIMATING A POSE OF AN ARTICULATED OBJECT MODEL

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

115 Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

METHOD FOR ESTIMATING A POSE OF AN ARTICULATED OBJECT MODEL

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

115 Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links