METHOD AND SYSTEM FOR GENERATING A 3D REPRESENTATION OF A DYNAMICALLY CHANGING 3D SCENE

US 20090315978A1
Filed: 05/24/2007
Published: 12/24/2009
Est. Priority Date: 06/02/2006
Status: Active Grant

First Claim

Patent Images

1. A method for generating a 3D representation of a dynamically changing 3D scene, comprising the steps of:

a) acquiring (102) at least two video streams (120) from at least two cameras (702) located at different locations and observing the same 3D scene (701);

b) determining (103) camera parameters (122), which comprise the position, orientation and internal parameters, for said at least two cameras (702);

c) tracking the movement of objects (310a,b, 312a,b;

330a,b, 331a,b, 332a,b;

410a,b, 411a,b;

430a,b, 431a,b;

420a,b, 421a,b) in the at least two video streams (104);

d) determining the identity of said objects in the at least two video streams (105); and

e) determining the 3D position of the objects by combining the information from the at least two video streams (107);

wherein at least one of the steps listed above (103, 104, 105) relies on information derived from the at least two video streams by one of the subsequent steps (107).

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for generating a 3D representation of a dynamically changing 3D scene, which includes the steps of:

- acquiring at least two synchronised video streams (120) from at least two cameras located at different locations and observing the same 3D scene (102);
- determining camera parameters, which comprise the orientation and zoom setting, for the at least two cameras (103);
- tracking the movement of objects (310a,b, 312a,b; 330a,b, 331 a,b, 332a,b; 410a,b, 411a,b; 430a,b, 431a,b; 420a,b, 421 a,b) in the at least two video streams (104);
- determining the identity of the objects in the at least two video streams (105);
- determining the 3D position of the objects by combining the information from the at least two video streams (106);
- wherein the step of tracking (104) the movement of objects in the at least two video streams uses position information derived from the 3D position of the objects in one or more earlier instants in time.

As a result, the quality, speed and robustness of the 2D tracking in the video streams is improved.

Citations

32 Claims

1. A method for generating a 3D representation of a dynamically changing 3D scene, comprising the steps of:
- a) acquiring (102) at least two video streams (120) from at least two cameras (702) located at different locations and observing the same 3D scene (701);
  
  b) determining (103) camera parameters (122), which comprise the position, orientation and internal parameters, for said at least two cameras (702);
  
  c) tracking the movement of objects (310a,b, 312a,b;
  
  330a,b, 331a,b, 332a,b;
  
  410a,b, 411a,b;
  
  430a,b, 431a,b;
  
  420a,b, 421a,b) in the at least two video streams (104);
  
  d) determining the identity of said objects in the at least two video streams (105); and
  
  e) determining the 3D position of the objects by combining the information from the at least two video streams (107);
  
  wherein at least one of the steps listed above (103, 104, 105) relies on information derived from the at least two video streams by one of the subsequent steps (107).
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 32)
- - 2. The method of claim 1, further comprising the step of:
    - f) determining the position and orientation of 3D rendering objects (903) corresponding to the objects in the video stream, wherein the 3D rendering objects (903) serve to render image information from one or more video streams when generating the 3D representation of the scene.
  - 3. The method of claim 1, wherein the 3D position of at least one object is extrapolated from at least two earlier 3D positions of the object (332a,b).
  - 4. The method of claim 1, wherein the step of tracking (104) the movement of objects (310a,b, 312a,b;
    - 330a,b, 331a,b, 332a,b;
      
      410a,b, 411a,b;
      
      430a,b, 431a,b;
      
      420a,b, 421a,b) in the at least two video streams (12) uses information that is derived in one of the subsequent steps (107) from the at least two video streams and stems from one or more earlier instants in time.
  - 5. The method of claim 4, wherein the information derived from the one or more earlier instants in time is the 3D position of the object (130).
  - 6. The method of claim 1, wherein the step of determining the identity of objects (105) in the at least two video streams (120) uses information that is derived in one of the subsequent steps (107) from the at least two video streams (120) and stems from one or more earlier instants in time.
  - 7. The method of claim 6, wherein the information derived from the one or more earlier instants in time is the 3D position of the object (130).
  - 8. The method of claim 1, wherein reference features used for determining (103) camera parameters (122), are features on a playing field and are identified in a video still image by a userpointing, with a graphical input device, to a representation of the same reference feature in a schematic representation of the playing field (204a-d), and selecting said representation of the reference feature;
9. The method of claim 8, wherein when selecting said reference feature in the video still image (203a-d), determining the exact position of the reference feature in the video still image by the steps ofautomatically performing, in the vicinity of the position selected by the user, a feature extraction and in particular an extraction of lines (203c), intersections and corners (203a, 203b);
- anddetermining the position of the reference feature as being the position of one of the features extracted, and in particular of a feature whose type is the same as that selected in the schematic representation of the playing field.
10. The method of claim 1, wherein the step of tracking the movement of objects (104) comprises the step of incorporating dynamically changing camera parameters (131) in the tracking function (104) such that the tracking function (104) compensates for changes in the camera parameters (131).
11. The method of claim 10, wherein the camera parameters taken into account in the tracking function are camera parameters (131) determined by a camera calibration step (103) performed for the same video frame for which the tracking is done.
12. The method of claim 10, wherein the camera parameters taken into account in the tracking function are camera parameters (131) determined by a camera calibration step (103) performed for one or more previous video frames and are optionally extrapolated.
13. The method of claim 1, wherein, for initialising object identifications, the following steps are performed:
- a user selecting, in a still image of one of the video streams, one object and assigning it a unique identifier; and
  
  automatically determining, in a further still image of at least one further video stream, an object whose identity is the same.
14. The method of claim 13, wherein, in a situation in which an object that is not or cannot be identified, appears in one of the video streams, performing the steps of:
- alerting the user to the presence of an unidentified object; and
  
  permitting the user to associate an identifier with the object.
15. The method of claim 1, wherein the objects are categorised as belonging to one of at least two categories, the categories preferably being based on a statistical model and comprising at least two of a first team, a second team, a ball and a referee.
16. The method of claim 1, comprising a segmentation step (106) in which objects are separated from the background, comprising the step ofusing alpha channel matching to assign, to each picture element, a value that expresses the probability with which the pixel is part of an object or part of the background.
17. The method of claim 1, comprising a segmentation step (106) in which objects are separated from a background, comprising the step ofafter removing the objects, filling corresponding holes or unknown parts left in the background by image inpainting and marking such inpainted picture elements as being synthetic image data.
18. The method of claim 17, wherein the segmentation step (106) comprises the step of refining the position and dimensions of bounding boxes (501, 601, 602, 603) around the objects.
19. The method of claim 17, wherein the step of filling holes in the background comprises the step of:
- mapping image data that corresponds to source patches (804a, 804b, 804c, 804d) comprising real image data to destination patches (803a, 803b, 803c, 803d) comprising unknown parts of the image, thereby filling the holes (802);
  
  wherein the mapping involves a transformation of the patches according to their spatial relationship.
20. The method of claim 17, comprising the step of:
- mapping image data that corresponds to source patches (804c, 804d) comprising real image data to destination patches (803c, 803d) comprising unknown parts of the image, thereby filling the holes (802);
  
  wherein the unknown part of the image is known to comprise a landmark feature (807),by choosing a destination patch (803c, 803d) to cover at least part of the landmark feature (807), and by searching the known image for a matching source patch (804c, 804d) along the landmark (806a).
21. The method of claim 20, comprising the step offor filling a hole comprising a section of a circular landmark (806b), mapping the destination patch (803e) to the source patch (804e) and vice versa by transforming, preferably rotating scaling, these patches according to their location along the circular landmark (806b).
22. The method of claim 20, comprising the step offor filling a hole comprising a section of a straight line landmark, mapping the destination patch to the source patch and vice versa by transforming, preferably scaling, these patches according to their location along the straight line landmark.
23. The method of claim 20, further comprising the step of:
- associating a line landmark with a line width;
  
  classifying image elements in source and/or destination patches as being part of the landmark (812) or not (813), according to said line width;
  
  when searching the known image for a matching source patch (804c, 804d), and when copying a source patch (804c, 804d), to a destination patch (803c, 803d), only considering image elements that are part of the landmark.
24. The method of claim 1, further comprising providing (108) a synthesized view from a virtual viewpoint that is distinct from the camera positions by the steps of:
- providing camera parameters of a virtual camera (703);
  
  determining a background image as seen by the virtual camera (703) on a background model (901, 902);
  
  determining a projection of each of the objects into the virtual camera (703) and superimposing it on the background image; and
  
  outputting the combined image for storage or for further processing.
25. The method of claim 24, wherein the step of determining a background image as seen by the virtual camera (703) comprises the steps ofblending, for each background picture element, image information from the different video streams that correspond to the same background location;
- giving priority to image information that is not marked as being synthetic image data; and
  
  rendering the image information on a background model comprising one or more surfaces (901, 902) representing the background.
26. The method of claim 25, wherein, in the background model, the surface representing the background is a surface (901) representing the playing field or playing field (701), and optionally also comprises surfaces (902) representing an 3D environment model.
27. The method of claim 18, wherein the step of determining a background image as seen by the virtual camera (703) further comprises:
- rendering predetermined image data on the background model (901, 902), superimposing it over or replacing the image information provided by the video streams.
28. The method of claim 24, wherein the step of determining a projection of each of the objects into the virtual camera (703) comprises the step of:
- rendering the image information from one or more video streams onto 3D rendering objects (903) placed in the 3D background model (901, 902).
30. The data processing system of claim 23, wherein the 3D merging and 3D object position calculation module (107) is configured to provide the function off) determining the position and orientation of 3D rendering objects (903) corresponding to the objects in the video stream, wherein the 3D rendering objects (903) serve to render image information from one or more video streams when generating the 3D representation of the scene.
31. The data processing system of claim 30, comprising an object cutout module (106) for determiningfilled-in background texture data (125) incorporating a flag that specifies whether a particular image patch or pixel is derived from real image data or was generated synthetically,an object texture and alpha mask (126) for each video stream and each object being tracked, and,for each object being tracked, an object 2D position and shape and a real-world object identification (127).
32. The data processing system of claim 31, comprisingan image synthesis module (108) which provides, from the 3D position (128) of the objects, the filled-in background texture data (125) and the an object texture and alpha mask (126) video data to a consumer (109).

29. A data processing system for generating a 3D representation of a dynamically changing 3D scene, comprisinga) a data acquisition module (102) acquiring at least two video streams from at least two cameras located at different locations and observing the same 3D scene;
- b) a camera calibration module (103) for determining camera parameters, which comprise the position, orientation and internal parameters, for said at least two cameras;
  
  c) a 2D tracking module (104) for tracking the movement of objects in the at least two video streams;
  
  d) an object identification module (105) for determining the identity of said objects in the at least two video streams; and
  
  e) a 3D merging and 3D object position calculation module (107) for determining the 3D position (128) of the objects by combining the information determined from the at least two video streams;
  
  wherein at least one of the modules listed above (103, 104, 105) is configured to rely on information derived from the at least two video streams by one of the subsequent steps (107).

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Vizrt AG (Vizrt Group Holding AS)
Original Assignee
ETH Zurich
Inventors
Würmlin, Stephan, Niederberger, Christoph

Granted Patent

US 9,406,131 B2
Time in Patent Office

Days
Field of Search
US Class Current

348/43
CPC Class Codes

A63B 2024/0025   Tracking the path or locati...

G06T 15/205   Image-based rendering

G06T 2200/08   involving all processing st...

G06T 2207/30221   Sports video; Sports image

G06T 2207/30241   Trajectory

G06T 5/77   Retouching; Inpainting; Scr...

G06T 7/20   Analysis of motion motion e...

G06T 7/292   Multi-camera tracking

G06T 7/593   from stereo images

G06T 7/85   Stereo camera calibration

G06V 20/42   of sport video content

METHOD AND SYSTEM FOR GENERATING A 3D REPRESENTATION OF A DYNAMICALLY CHANGING 3D SCENE

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

32 Claims

Specification

Solutions

Use Cases

Quick Links

METHOD AND SYSTEM FOR GENERATING A 3D REPRESENTATION OF A DYNAMICALLY CHANGING 3D SCENE

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

32 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links