Real time overlay placement in videos for augmented reality applications
First Claim
1. A processor implemented method, comprising:
- receiving, in real time, (i) an input video comprising a plurality of frames and an object of interest in the plurality of frames, and (ii) a label for which an initial overlay position is pre-computed for placement on a center frame of the input video (202);
computing, in real time, a saliency map for each of the plurality of frames to obtain a plurality of saliency maps (204);
computing, in real time, for each of the plurality of frames, Euclidean distance between a current overlay position and a previous overlay position based on the initial overlay position of the label to obtain a plurality of Euclidean distances (206), wherein the Euclidean distance for each of the plurality of frames is computed for controlling, in real time, temporal jitter in a position of the label to be placed in the input video; and
calculating, in real time, an updated overlay position of the label for placement in the input video based on the plurality of saliency maps and the plurality of Euclidean distances (208).
2 Assignments
0 Petitions
Accused Products
Abstract
Textual overlays/labels add contextual information in Augmented Reality (AR) applications. The spatial placement of labels is a challenging task particularly for real time videos. Embodiments of the present disclosure provide systems and methods for optimal placement of contextual information for Augmented Reality (AR) applications to overcome the limitations of occlusion with object/scene of interest through optimally placing labels aiding better interpretation of scene. This is achieved by combining saliency maps computed for each frame of an input video with Euclidean distance between current and previous overall positions for each frame based on an initial overlay position of the label to calculate an updated overlay position for label placement in the video. The placement of overlays is formulated as an objective function that minimizes visual saliency around the object of interest and minimizes the temporal jitter facilitating coherence in real-time AR applications.
-
Citations
12 Claims
-
1. A processor implemented method, comprising:
-
receiving, in real time, (i) an input video comprising a plurality of frames and an object of interest in the plurality of frames, and (ii) a label for which an initial overlay position is pre-computed for placement on a center frame of the input video (202); computing, in real time, a saliency map for each of the plurality of frames to obtain a plurality of saliency maps (204); computing, in real time, for each of the plurality of frames, Euclidean distance between a current overlay position and a previous overlay position based on the initial overlay position of the label to obtain a plurality of Euclidean distances (206), wherein the Euclidean distance for each of the plurality of frames is computed for controlling, in real time, temporal jitter in a position of the label to be placed in the input video; and calculating, in real time, an updated overlay position of the label for placement in the input video based on the plurality of saliency maps and the plurality of Euclidean distances (208). - View Dependent Claims (2, 3, 4)
-
-
5. A system (100), comprising:
-
a memory (102) storing instructions; one or more communication interfaces (106); and one or more hardware processors (104) coupled to the memory (102) via the one or more communication interfaces (106), wherein the one or more hardware processors (104) are configured by the instructions to; receive, in real time, (i) an input video comprising a plurality of frames and an object of interest in the plurality of frames, and (ii) a label for which an initial overlay position is precomputed for placement on a center frame of the input video; compute, in real time, a saliency map for each of the plurality of frames to obtain a plurality of saliency maps; compute, in real time, for each of the plurality of frames, Euclidean distance between a current overlay position and a previous overlay position based on the initial overlay position of the label to obtain a plurality of Euclidean distances, wherein the Euclidean distance for each of the plurality of frames is computed for controlling, in real time, temporal jitter in a position of the label to be placed in the input video; and calculate, in real time, an updated overlay position of the label for placement in the input video based on the plurality of saliency maps and the plurality of Euclidean distances. - View Dependent Claims (6, 7, 8)
-
-
9. One or more non-transitory machine-readable information storage mediums comprising one or more instructions which when executed by one or more hardware processors cause:
-
receiving, in real time, (i) an input video comprising a plurality of frames and an object of interest in the plurality of frames, and (ii) a label for which an initial overlay position is precomputed for placement on a center frame of the input video; computing, in real time, a saliency map for each of the plurality of frames to obtain a plurality of saliency maps; computing, in real time, for each of the plurality of frames, Euclidean distance between a current overlay position and a previous overlay position based on the initial overlay position of the label to obtain a plurality of Euclidean distances, wherein the Euclidean distance for each of the plurality of frames is computed for controlling, in real time, temporal jitter in a position of the label to be placed in the input video; and calculating, in real time, an updated overlay position of the label for placement in the input video based on the plurality of saliency maps and the plurality of Euclidean distances. - View Dependent Claims (10, 11, 12)
-
Specification