METHOD AND SYSTEM FOR GENERATING A 3D REPRESENTATION OF A DYNAMICALLY CHANGING 3D SCENE
First Claim
Patent Images
1. A method for generating a 3D representation of a dynamically changing 3D scene, comprising the steps of:
- a) acquiring (102) at least two video streams (120) from at least two cameras (702) located at different locations and observing the same 3D scene (701);
b) determining (103) camera parameters (122), which comprise the position, orientation and internal parameters, for said at least two cameras (702);
c) tracking the movement of objects (310a,b, 312a,b;
330a,b, 331a,b, 332a,b;
410a,b, 411a,b;
430a,b, 431a,b;
420a,b, 421a,b) in the at least two video streams (104);
d) determining the identity of said objects in the at least two video streams (105); and
e) determining the 3D position of the objects by combining the information from the at least two video streams (107);
wherein at least one of the steps listed above (103, 104, 105) relies on information derived from the at least two video streams by one of the subsequent steps (107).
3 Assignments
0 Petitions
Accused Products
Abstract
A method for generating a 3D representation of a dynamically changing 3D scene, which includes the steps of:
- acquiring at least two synchronised video streams (120) from at least two cameras located at different locations and observing the same 3D scene (102);
- determining camera parameters, which comprise the orientation and zoom setting, for the at least two cameras (103);
- tracking the movement of objects (310a,b, 312a,b; 330a,b, 331 a,b, 332a,b; 410a,b, 411a,b; 430a,b, 431a,b; 420a,b, 421 a,b) in the at least two video streams (104);
- determining the identity of the objects in the at least two video streams (105);
- determining the 3D position of the objects by combining the information from the at least two video streams (106);
- wherein the step of tracking (104) the movement of objects in the at least two video streams uses position information derived from the 3D position of the objects in one or more earlier instants in time.
As a result, the quality, speed and robustness of the 2D tracking in the video streams is improved.
-
Citations
32 Claims
-
1. A method for generating a 3D representation of a dynamically changing 3D scene, comprising the steps of:
-
a) acquiring (102) at least two video streams (120) from at least two cameras (702) located at different locations and observing the same 3D scene (701); b) determining (103) camera parameters (122), which comprise the position, orientation and internal parameters, for said at least two cameras (702); c) tracking the movement of objects (310a,b, 312a,b;
330a,b, 331a,b, 332a,b;
410a,b, 411a,b;
430a,b, 431a,b;
420a,b, 421a,b) in the at least two video streams (104);d) determining the identity of said objects in the at least two video streams (105); and e) determining the 3D position of the objects by combining the information from the at least two video streams (107); wherein at least one of the steps listed above (103, 104, 105) relies on information derived from the at least two video streams by one of the subsequent steps (107). - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 32)
pointing, with a graphical input device, to a particular reference feature as seen in the video still image (203a-d), and selecting said reference feature; and associating the identity of the representation of the reference feature with the reference feature seen in the still image.
-
-
9. The method of claim 8, wherein when selecting said reference feature in the video still image (203a-d), determining the exact position of the reference feature in the video still image by the steps of
automatically performing, in the vicinity of the position selected by the user, a feature extraction and in particular an extraction of lines (203c), intersections and corners (203a, 203b); - and
determining the position of the reference feature as being the position of one of the features extracted, and in particular of a feature whose type is the same as that selected in the schematic representation of the playing field.
- and
-
10. The method of claim 1, wherein the step of tracking the movement of objects (104) comprises the step of incorporating dynamically changing camera parameters (131) in the tracking function (104) such that the tracking function (104) compensates for changes in the camera parameters (131).
-
11. The method of claim 10, wherein the camera parameters taken into account in the tracking function are camera parameters (131) determined by a camera calibration step (103) performed for the same video frame for which the tracking is done.
-
12. The method of claim 10, wherein the camera parameters taken into account in the tracking function are camera parameters (131) determined by a camera calibration step (103) performed for one or more previous video frames and are optionally extrapolated.
-
13. The method of claim 1, wherein, for initialising object identifications, the following steps are performed:
-
a user selecting, in a still image of one of the video streams, one object and assigning it a unique identifier; and automatically determining, in a further still image of at least one further video stream, an object whose identity is the same.
-
-
14. The method of claim 13, wherein, in a situation in which an object that is not or cannot be identified, appears in one of the video streams, performing the steps of:
-
alerting the user to the presence of an unidentified object; and permitting the user to associate an identifier with the object.
-
-
15. The method of claim 1, wherein the objects are categorised as belonging to one of at least two categories, the categories preferably being based on a statistical model and comprising at least two of a first team, a second team, a ball and a referee.
-
16. The method of claim 1, comprising a segmentation step (106) in which objects are separated from the background, comprising the step of
using alpha channel matching to assign, to each picture element, a value that expresses the probability with which the pixel is part of an object or part of the background. -
17. The method of claim 1, comprising a segmentation step (106) in which objects are separated from a background, comprising the step of
after removing the objects, filling corresponding holes or unknown parts left in the background by image inpainting and marking such inpainted picture elements as being synthetic image data. -
18. The method of claim 17, wherein the segmentation step (106) comprises the step of refining the position and dimensions of bounding boxes (501, 601, 602, 603) around the objects.
-
19. The method of claim 17, wherein the step of filling holes in the background comprises the step of:
mapping image data that corresponds to source patches (804a, 804b, 804c, 804d) comprising real image data to destination patches (803a, 803b, 803c, 803d) comprising unknown parts of the image, thereby filling the holes (802);
wherein the mapping involves a transformation of the patches according to their spatial relationship.
-
20. The method of claim 17, comprising the step of:
-
mapping image data that corresponds to source patches (804c, 804d) comprising real image data to destination patches (803c, 803d) comprising unknown parts of the image, thereby filling the holes (802);
wherein the unknown part of the image is known to comprise a landmark feature (807),by choosing a destination patch (803c, 803d) to cover at least part of the landmark feature (807), and by searching the known image for a matching source patch (804c, 804d) along the landmark (806a).
-
-
21. The method of claim 20, comprising the step of
for filling a hole comprising a section of a circular landmark (806b), mapping the destination patch (803e) to the source patch (804e) and vice versa by transforming, preferably rotating scaling, these patches according to their location along the circular landmark (806b). -
22. The method of claim 20, comprising the step of
for filling a hole comprising a section of a straight line landmark, mapping the destination patch to the source patch and vice versa by transforming, preferably scaling, these patches according to their location along the straight line landmark. -
23. The method of claim 20, further comprising the step of:
-
associating a line landmark with a line width; classifying image elements in source and/or destination patches as being part of the landmark (812) or not (813), according to said line width; when searching the known image for a matching source patch (804c, 804d), and when copying a source patch (804c, 804d), to a destination patch (803c, 803d), only considering image elements that are part of the landmark.
-
-
24. The method of claim 1, further comprising providing (108) a synthesized view from a virtual viewpoint that is distinct from the camera positions by the steps of:
-
providing camera parameters of a virtual camera (703); determining a background image as seen by the virtual camera (703) on a background model (901, 902); determining a projection of each of the objects into the virtual camera (703) and superimposing it on the background image; and outputting the combined image for storage or for further processing.
-
-
25. The method of claim 24, wherein the step of determining a background image as seen by the virtual camera (703) comprises the steps of
blending, for each background picture element, image information from the different video streams that correspond to the same background location; -
giving priority to image information that is not marked as being synthetic image data; and rendering the image information on a background model comprising one or more surfaces (901, 902) representing the background.
-
-
26. The method of claim 25, wherein, in the background model, the surface representing the background is a surface (901) representing the playing field or playing field (701), and optionally also comprises surfaces (902) representing an 3D environment model.
-
27. The method of claim 18, wherein the step of determining a background image as seen by the virtual camera (703) further comprises:
rendering predetermined image data on the background model (901, 902), superimposing it over or replacing the image information provided by the video streams.
-
28. The method of claim 24, wherein the step of determining a projection of each of the objects into the virtual camera (703) comprises the step of:
rendering the image information from one or more video streams onto 3D rendering objects (903) placed in the 3D background model (901, 902).
-
30. The data processing system of claim 23, wherein the 3D merging and 3D object position calculation module (107) is configured to provide the function of
f) determining the position and orientation of 3D rendering objects (903) corresponding to the objects in the video stream, wherein the 3D rendering objects (903) serve to render image information from one or more video streams when generating the 3D representation of the scene. -
31. The data processing system of claim 30, comprising an object cutout module (106) for determining
filled-in background texture data (125) incorporating a flag that specifies whether a particular image patch or pixel is derived from real image data or was generated synthetically, an object texture and alpha mask (126) for each video stream and each object being tracked, and, for each object being tracked, an object 2D position and shape and a real-world object identification (127). -
32. The data processing system of claim 31, comprising
an image synthesis module (108) which provides, from the 3D position (128) of the objects, the filled-in background texture data (125) and the an object texture and alpha mask (126) video data to a consumer (109).
-
29. A data processing system for generating a 3D representation of a dynamically changing 3D scene, comprising
a) a data acquisition module (102) acquiring at least two video streams from at least two cameras located at different locations and observing the same 3D scene; -
b) a camera calibration module (103) for determining camera parameters, which comprise the position, orientation and internal parameters, for said at least two cameras; c) a 2D tracking module (104) for tracking the movement of objects in the at least two video streams; d) an object identification module (105) for determining the identity of said objects in the at least two video streams; and e) a 3D merging and 3D object position calculation module (107) for determining the 3D position (128) of the objects by combining the information determined from the at least two video streams; wherein at least one of the modules listed above (103, 104, 105) is configured to rely on information derived from the at least two video streams by one of the subsequent steps (107).
-
Specification