Annotating media content for automatic content understanding

US 9,659,597 B2
Filed: 04/22/2013
Issued: 05/23/2017
Est. Priority Date: 04/24/2012
Status: Active Grant

First Claim

Patent Images

1. A system to annotate media content, comprising:

a pattern recognition system (PRS) having an initial set of input parameters that generates PRS output metadata associated with a frame of a media stream;

an archive for storing ground truth metadata (GTM) associated with the same frame of the media stream;

a device to merge the GTM and the PRS output metadata and thereby generate proposed annotation data (PAD); and

a user interface for use by a human annotator (HA) including an editor and an input device to approve or edit the PAD for the frame; and

an optimization system to adjust input parameters for the PRS to minimize a single distance metric corresponding to a difference between the GTM and PRS output metadata, wherein each type of GTM is compared to a corresponding type of the PRS output metadata to generate a plurality of distance metrics by type, wherein the single distance metric is computed by combining the plurality of distance metrics by type, and wherein one type of the PRS output metadata includes spatial position.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system for annotating frames in a media stream 114 includes a pattern recognition system (PRS) 108 to generate PRS output metadata for a frame; an archive 106 for storing ground truth metadata (GTM); a device to merge the GTM and PRS output metadata and thereby generate proposed annotation data (PAD) 110; and a user interface 109 for use by the human annotator HA 118. The user interface 104 includes an editor 111 and an input device 107 used by the HA 118 to approve GTM for the frame. An optimization system 105 receives the approved GTM and metadata output by the PRS 108, and adjusts input parameters for the PRS to minimize a distance metric corresponding to a difference between the GTM and PRS output metadata.

Citations

18 Claims

1. A system to annotate media content, comprising:
- a pattern recognition system (PRS) having an initial set of input parameters that generates PRS output metadata associated with a frame of a media stream;
  
  an archive for storing ground truth metadata (GTM) associated with the same frame of the media stream;
  
  a device to merge the GTM and the PRS output metadata and thereby generate proposed annotation data (PAD); and
  
  a user interface for use by a human annotator (HA) including an editor and an input device to approve or edit the PAD for the frame; and
  
  an optimization system to adjust input parameters for the PRS to minimize a single distance metric corresponding to a difference between the GTM and PRS output metadata, wherein each type of GTM is compared to a corresponding type of the PRS output metadata to generate a plurality of distance metrics by type, wherein the single distance metric is computed by combining the plurality of distance metrics by type, and wherein one type of the PRS output metadata includes spatial position.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The system of claim 1, wherein the GTM is obtained from one or more of third party metadata, archived media stream and the HA.
  - 3. The system of claim 2, wherein a time delay between third party metadata and the media stream is corrected by alignment.
  - 4. The system of claim 2, further comprising a plurality of user interfaces and a communication network that enables a plurality of HAs to interface with the same media stream.
  - 5. The system of claim 2, wherein the system converts approved PAD to GTM.
  - 6. The system of claim 5, wherein the system graphically overlays the approved PAD on the media stream.
  - 7. The system of claim 1, wherein the optimization system adjusts the initial set of input parameters of the PRS to minimize the difference between the GTM and PRS output metadata, thereby increasing PRS accuracy.
  - 8. The system of claim 1, wherein the computing of the single distance metric includes weighting the plurality of distance metrics by type.

9. A method comprising the steps of:
- receiving data from a media stream, the data organized into frames;
  
  processing the data using a pattern recognition system (PRS);
  
  storing a state of the PRS;
  
  generating metadata associated with the frame using the PRS;
  
  receiving input characterized as ground truth metadata (GTM), into an optimization system; and
  
  adjusting input parameters for the PRS to minimize a single distance metric corresponding to a difference between the GTM and PRS output metadata, wherein each type of GTM is compared to a corresponding type of the PRS output metadata to generate a plurality of distance metrics by type, wherein the single distance metric is computed by combining the plurality of distance metrics by type, and wherein one type of the PRS output metadata includes spatial position.
- View Dependent Claims (10, 11, 12)
- - 10. The method of claim 9, wherein said input is obtained from one or more of archived media streams, third party metadata and one or more human annotators.
  - 11. The method of claim 10, wherein subsequent to receiving said input, said GTM and said metadata associated with said PRS are temporally aligned.
  - 12. The method of claim 10, wherein said GTM and said metadata associated with said PRS are continuously stored and memory and periodically stored to disk thereby enabling fast recovery from system failure.

13. A method comprising the steps of:
- receiving from a human annotator (HA), via a human annotator user interface (HAUT), information regarding a time point selected by the HA on a timeline of a media stream;
  
  merging existing ground truth metadata (GTM) relating to a media frame corresponding to the selected time point with pattern recognition system (PRS) output metadata relating to said media frame, thereby generating proposed annotation data (PAD) for the media frame;
  
  displaying the media frame and the PAD to the HA;
  
  receiving input from the HA including correction and/or approval of the PAD, where approved PAD is characterized as new GTM related to the selected time point;
  
  storing the new GTM;
  
  comparing the PRS output metadata and the new GTM related to the selected time point; and
  
  adjusting PRS input parameters so that a single distance metric corresponding to a difference between the new GTM and PRS output metadata related to the selected time point is minimized, wherein each type of GTM is compared to a corresponding type of the PRS output metadata to generate a plurality of distance metrics by type, wherein the single distance metric is computed by combining the plurality of distance metrics by type, and wherein one type of the PRS output metadata includes spatial position.
- View Dependent Claims (14, 15)
- - 14. The method of claim 13, in that said GTM is obtained from one or more of archived media streams, third party metadata, said human annotators and other human annotators.
  - 15. The method of claim 14, wherein when said human annotator approves said PAD, said PAD is graphically overlaid on said media stream.

16. A method comprising the steps of:
- generating output metadata associated with a frame of a media stream, output by a pattern recognition system (PRS);
  
  storing in an archive input from a human annotator (HA) related to the frame, characterized as ground truth metadata (GTM);
  
  merging the GTM and the output metadata of the PRS to thereby generate proposed annotation data (PAD); and
  
  displaying the PAD to the HA by a user interface;
  
  receiving via the user interface an input from the HA indicating approval of the GTM for the frame; and
  
  adjusting input parameters for the PRS using an optimization system, to minimize a single distance metric corresponding to a difference between the GTM and the output metadata of the PRS, wherein each type of GTM is compared to a corresponding type of the PRS output metadata to generate a plurality of distance metrics by type, wherein the single distance metric is computed by combining the plurality of distance metrics by type, and wherein the type of metadata includes spatial position.
- View Dependent Claims (17, 18)
- - 17. The method of claim 16, wherein said GTM is obtained from one or more of archived media streams, third party metadata, said human annotators and other human annotators.
  - 18. The method of claim 17, wherein, when said human annotator approves said PAD, said PAD is graphically overlaid on said media stream.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
LiveClips, LLC
Original Assignee
LiveClips, LLC
Inventors
Petajan, Eric David, Weite, David Eugene, Vunic, Douglas W.
Primary Examiner(s)
Tran, Thai
Assistant Examiner(s)
HUNTER, MISHAWN N

Application Number

US14/385,989
Publication Number

US 20150071618A1
Time in Patent Office

1,492 Days
Field of Search

715230, 715716, 382173, 382181, 386278, 386282
US Class Current
CPC Class Codes

G06F 16/48   Retrieval characterised by ...

G06F 40/169   Annotation, e.g. comment da...

G11B 27/036   Insert-editing

G11B 27/19   by using information detect...

G11B 27/28   by using information signal...

H04N 21/23418   involving operations for an...

H04N 21/23424   involving splicing one cont...

H04N 21/84   Generation or processing of...

H04N 21/854   Content authoring

Annotating media content for automatic content understanding

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Annotating media content for automatic content understanding

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links