Annotating media content for automatic content understanding
First Claim
1. A system, comprising:
- a pattern recognition system to generate, according to a set of input parameters, pattern recognition metadata associated with video frames of a media stream;
an encoder system to generate proposed annotation data associated with the video frames of the media stream by merging the pattern recognition metadata with ground-truth metadata associated with the video frames of the media stream; and
an optimization system to adjust the set of input parameters of the pattern recognition system to minimize a single distance metric including a combination of a plurality of distance metrics by type and to generate the plurality of distance metrics by type are by comparing each type of a plurality of ground-truth metadata types to a corresponding type of a plurality of pattern recognition metadata types, wherein one type of the plurality of pattern recognition metadata types is spatial position.
2 Assignments
0 Petitions
Accused Products
Abstract
A system for annotating frames in a media stream 114 includes a pattern recognition system (PRS) 108 to generate PRS output metadata for a frame; an archive 106 for storing ground truth metadata (GTM); a device to merge the GTM and PRS output metadata and thereby generate proposed annotation data (PAD) 110; and a user interface 109 for use by the human annotator HA 118. The user interface 104 includes an editor 111 and an input device 107 used by the HA 118 to approve GTM for the frame. An optimization system 105 receives the approved GTM and metadata output by the PRS 108, and adjusts input parameters for the PRS to minimize a distance metric corresponding to a difference between the GTM and PRS output metadata.
75 Citations
19 Claims
-
1. A system, comprising:
-
a pattern recognition system to generate, according to a set of input parameters, pattern recognition metadata associated with video frames of a media stream; an encoder system to generate proposed annotation data associated with the video frames of the media stream by merging the pattern recognition metadata with ground-truth metadata associated with the video frames of the media stream; and an optimization system to adjust the set of input parameters of the pattern recognition system to minimize a single distance metric including a combination of a plurality of distance metrics by type and to generate the plurality of distance metrics by type are by comparing each type of a plurality of ground-truth metadata types to a corresponding type of a plurality of pattern recognition metadata types, wherein one type of the plurality of pattern recognition metadata types is spatial position. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method, comprising:
-
generating, by a pattern recognition system, according to a set of input parameters, pattern recognition metadata associated with video frames of a media stream; generating, by an encoder system, proposed annotation data associated with the video frames of the media stream by merging the pattern recognition metadata with ground-truth metadata that is associated with the video frames of the media stream; and adjusting, by an optimization system, the set of input parameters of the pattern recognition system to minimize a single distance metric including combination of a plurality of distance metrics by type and to generate the plurality of distance metrics by type by comparing each type of a plurality of ground-truth metadata types to a corresponding type of a plurality of pattern recognition metadata types. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. A non-transitory machine-readable storage medium, comprising executable instructions that, when executed by a processing system including a processor, facilitate performance of operations, comprising:
- generating, via a patterned recognition system and according to a set of input parameters, pattern recognition metadata associated with video frames of a media stream;
generating proposed annotation data associated with the video frames of the media stream by merging the pattern recognition metadata with ground-truth metadata associated with the video frames of the media stream; and
adjusting the set of input parameters to minimize a single distance metric including combination of a plurality of distance metrics by type and to generate the plurality of distance metrics by type by comparing each type of a plurality of ground-truth metadata types to a corresponding type of a plurality of pattern recognition metadata types. - View Dependent Claims (18, 19)
- generating, via a patterned recognition system and according to a set of input parameters, pattern recognition metadata associated with video frames of a media stream;
Specification