Annotating media content for automatic content understanding
First Claim
1. A system to annotate media content, comprising:
- a pattern recognition system (PRS) having an initial set of input parameters that generates PRS output metadata associated with a frame of a media stream;
an archive for storing ground truth metadata (GTM) associated with the same frame of the media stream;
a device to merge the GTM and the PRS output metadata and thereby generate proposed annotation data (PAD); and
a user interface for use by a human annotator (HA) including an editor and an input device to approve or edit the PAD for the frame; and
an optimization system to adjust input parameters for the PRS to minimize a single distance metric corresponding to a difference between the GTM and PRS output metadata, wherein each type of GTM is compared to a corresponding type of the PRS output metadata to generate a plurality of distance metrics by type, wherein the single distance metric is computed by combining the plurality of distance metrics by type, and wherein one type of the PRS output metadata includes spatial position.
2 Assignments
0 Petitions
Accused Products
Abstract
A system for annotating frames in a media stream 114 includes a pattern recognition system (PRS) 108 to generate PRS output metadata for a frame; an archive 106 for storing ground truth metadata (GTM); a device to merge the GTM and PRS output metadata and thereby generate proposed annotation data (PAD) 110; and a user interface 109 for use by the human annotator HA 118. The user interface 104 includes an editor 111 and an input device 107 used by the HA 118 to approve GTM for the frame. An optimization system 105 receives the approved GTM and metadata output by the PRS 108, and adjusts input parameters for the PRS to minimize a distance metric corresponding to a difference between the GTM and PRS output metadata.
-
Citations
18 Claims
-
1. A system to annotate media content, comprising:
-
a pattern recognition system (PRS) having an initial set of input parameters that generates PRS output metadata associated with a frame of a media stream; an archive for storing ground truth metadata (GTM) associated with the same frame of the media stream; a device to merge the GTM and the PRS output metadata and thereby generate proposed annotation data (PAD); and a user interface for use by a human annotator (HA) including an editor and an input device to approve or edit the PAD for the frame; and an optimization system to adjust input parameters for the PRS to minimize a single distance metric corresponding to a difference between the GTM and PRS output metadata, wherein each type of GTM is compared to a corresponding type of the PRS output metadata to generate a plurality of distance metrics by type, wherein the single distance metric is computed by combining the plurality of distance metrics by type, and wherein one type of the PRS output metadata includes spatial position. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method comprising the steps of:
-
receiving data from a media stream, the data organized into frames; processing the data using a pattern recognition system (PRS); storing a state of the PRS; generating metadata associated with the frame using the PRS; receiving input characterized as ground truth metadata (GTM), into an optimization system; and adjusting input parameters for the PRS to minimize a single distance metric corresponding to a difference between the GTM and PRS output metadata, wherein each type of GTM is compared to a corresponding type of the PRS output metadata to generate a plurality of distance metrics by type, wherein the single distance metric is computed by combining the plurality of distance metrics by type, and wherein one type of the PRS output metadata includes spatial position. - View Dependent Claims (10, 11, 12)
-
-
13. A method comprising the steps of:
-
receiving from a human annotator (HA), via a human annotator user interface (HAUT), information regarding a time point selected by the HA on a timeline of a media stream; merging existing ground truth metadata (GTM) relating to a media frame corresponding to the selected time point with pattern recognition system (PRS) output metadata relating to said media frame, thereby generating proposed annotation data (PAD) for the media frame; displaying the media frame and the PAD to the HA; receiving input from the HA including correction and/or approval of the PAD, where approved PAD is characterized as new GTM related to the selected time point; storing the new GTM; comparing the PRS output metadata and the new GTM related to the selected time point; and adjusting PRS input parameters so that a single distance metric corresponding to a difference between the new GTM and PRS output metadata related to the selected time point is minimized, wherein each type of GTM is compared to a corresponding type of the PRS output metadata to generate a plurality of distance metrics by type, wherein the single distance metric is computed by combining the plurality of distance metrics by type, and wherein one type of the PRS output metadata includes spatial position. - View Dependent Claims (14, 15)
-
-
16. A method comprising the steps of:
-
generating output metadata associated with a frame of a media stream, output by a pattern recognition system (PRS); storing in an archive input from a human annotator (HA) related to the frame, characterized as ground truth metadata (GTM); merging the GTM and the output metadata of the PRS to thereby generate proposed annotation data (PAD); and displaying the PAD to the HA by a user interface; receiving via the user interface an input from the HA indicating approval of the GTM for the frame; and adjusting input parameters for the PRS using an optimization system, to minimize a single distance metric corresponding to a difference between the GTM and the output metadata of the PRS, wherein each type of GTM is compared to a corresponding type of the PRS output metadata to generate a plurality of distance metrics by type, wherein the single distance metric is computed by combining the plurality of distance metrics by type, and wherein the type of metadata includes spatial position. - View Dependent Claims (17, 18)
-
Specification