Systems and methods for the autonomous production of videos from multi-sensored data
First Claim
1. A computer based camerawork method for autonomous production of an edited video from multiple video streams captured by a plurality of fixed and/or motorized cameras distributed around a scene of interest, that selects, based on a known location of a set of objects-of-interest and as a function of time, sequences of optimal viewpoints to fit a display resolution and user preferences, and for smoothing these sequences of optimal viewpoints for a continuous and graceful story-telling, the camerawork method comprising:
- selecting, for each envisioned camera location and/or position, a field of view obtained by;
either cropping an image captured by a fixed camera, thereby defining image cropping parameters, orselecting pan-tilt-zoom parameters for a virtual or motorized camera,wherein, as part of said field of view selection, objects-of-interest are included and the field of view is selected based on joint processing of the positions of the multiple objects-of-interest that have been detected, andwherein the selection of the field of view is done in a way that balances completeness and closeness metrics as a function of individual user preferences, wherein completeness counts a number of objects-of-interest that are included and visible within the displayed viewpoint, and closeness measures a number of pixels that are available to describe the objects-of-interest, and wherein said user preferences define a set of parameters that are used to tune the trade-off between completeness and closeness, andautonomously building the edited video by selecting and concatenating video segments provided by one or more individual cameras, wherein the building is done in a way that balances completeness and closeness metrics along the time, while smoothing out the sequence of said cropping and/or pan-tilt-zoom parameters associated to concatenated segments, wherein the smoothing process is implemented based on a linear or non-linear low-pass temporal filter mechanism, and the relative importance of each camera location is tuned according to user preference.
2 Assignments
0 Petitions
Accused Products
Abstract
An autonomous computer based method and system is described for personalized production of videos such as team sport videos such as basketball videos from multi-sensored data under limited display resolution. Embodiments of the present invention relate to the selection of a view to display from among the multiple video streams captured by the camera network. Technical solutions are provided to provide perceptual comfort as well as an efficient integration of contextual information, which is implemented, for example, by smoothing generated viewpoint/camera sequences to alleviate flickering visual artifacts and discontinuous story-telling artifacts. A design and implementation of the viewpoint selection process is disclosed that has been verified by experiments, which shows that the method and system of the present invention efficiently distribute the processing load across cameras, and effectively selects viewpoints that cover the team action at hand while avoiding major perceptual artifacts.
103 Citations
21 Claims
-
1. A computer based camerawork method for autonomous production of an edited video from multiple video streams captured by a plurality of fixed and/or motorized cameras distributed around a scene of interest, that selects, based on a known location of a set of objects-of-interest and as a function of time, sequences of optimal viewpoints to fit a display resolution and user preferences, and for smoothing these sequences of optimal viewpoints for a continuous and graceful story-telling, the camerawork method comprising:
-
selecting, for each envisioned camera location and/or position, a field of view obtained by; either cropping an image captured by a fixed camera, thereby defining image cropping parameters, or selecting pan-tilt-zoom parameters for a virtual or motorized camera, wherein, as part of said field of view selection, objects-of-interest are included and the field of view is selected based on joint processing of the positions of the multiple objects-of-interest that have been detected, and wherein the selection of the field of view is done in a way that balances completeness and closeness metrics as a function of individual user preferences, wherein completeness counts a number of objects-of-interest that are included and visible within the displayed viewpoint, and closeness measures a number of pixels that are available to describe the objects-of-interest, and wherein said user preferences define a set of parameters that are used to tune the trade-off between completeness and closeness, and autonomously building the edited video by selecting and concatenating video segments provided by one or more individual cameras, wherein the building is done in a way that balances completeness and closeness metrics along the time, while smoothing out the sequence of said cropping and/or pan-tilt-zoom parameters associated to concatenated segments, wherein the smoothing process is implemented based on a linear or non-linear low-pass temporal filter mechanism, and the relative importance of each camera location is tuned according to user preference. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 21)
-
-
10. A computer based camerawork system comprising a processing engine and memory for autonomous production of an edited video from multiple video streams captured by a plurality of fixed and/or motorized cameras distributed around a scene of interest, that selects based on known location of a set of objects-of-interest and as a function of time, sequences of optimal viewpoints to fit a display resolution and user preferences, and for smoothing these sequences of optimal viewpoints for a continuous and graceful story-telling, the camerawork system comprising:
-
first means for selecting, for each envisioned camera location and/or position, a field of view obtained by;
either cropping an image captured by a fixed camera, thereby defining image cropping parameters, or selecting pan-tilt-zoom parameters of a virtual or motorized camera,wherein, as part of said field of view selection, objects-of-interest are included and the field of view is selected based on joint processing of the positions of the multiple objects-of-interest that have been detected, wherein the selection of the field of view is done in way that balances completeness and closeness metrics as a function of individual user preferences, wherein completeness counts the number of objects-of-interest that are included and visible within the displayed viewpoint, and closeness measures the number of pixels that are available to describe the objects-of-interest, and wherein said user preferences define a set of parameters that are used to tune the trade-off between completeness and closeness, and second means for autonomously selecting rendering parameters that maximize and smooth out closeness and completeness metrics by concatenating segments in the video streams provided by one or more individual cameras, wherein the building is done in a way that balances completeness and closeness metrics along the time, while smoothing out the sequence of said cropping and/or pan-tilt-zoom parameters associated to concatenated segments, wherein the smoothing process is implemented based on a linear or non-linear low-pass temporal filtering mechanism, and the relative importance of each camera location is tuned according to user preferences. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification