Temporally consistent caption detection on videos using a 3D spatiotemporal method
First Claim
Patent Images
1. An apparatus comprising:
- an input for receiving a video signal comprising a first image and a second image wherein said second image is received after said first image;
a processor operative to determine a first probable location of a first caption within a first image, analyze said first image to identify a first region of said first image comprising said probable location of said first caption, determine a second probable location of a second caption with a second image, analyze said second image to identify a second region of said second image comprising said probable location of said second caption, determine a spatial overlap between said first region and said second region, and generate a data representing said spatial overlap.an output for coupling said data to a video processor.
4 Assignments
0 Petitions
Accused Products
Abstract
A caption detection system wherein all detected caption boxes over time for one caption area are identical, thereby reducing temporal instability and inconsistency. This is achieved by grouping candidate pixels in the 3D spatiotemporal space and generating a 3D bounding box for one caption area. 2D bounding boxes are obtained by slicing the 3D bounding boxes, thereby reducing temporal instability as all 2D bounding boxes corresponding to a caption area are sliced from one 3D bounding box and are therefore identical over time.
-
Citations
19 Claims
-
1. An apparatus comprising:
-
an input for receiving a video signal comprising a first image and a second image wherein said second image is received after said first image; a processor operative to determine a first probable location of a first caption within a first image, analyze said first image to identify a first region of said first image comprising said probable location of said first caption, determine a second probable location of a second caption with a second image, analyze said second image to identify a second region of said second image comprising said probable location of said second caption, determine a spatial overlap between said first region and said second region, and generate a data representing said spatial overlap. an output for coupling said data to a video processor. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method for processing a video signal comprising the steps of:
-
receiving a first image in said video signal; determining a first probable location of a first caption within a first image; analyzing said first image to identify a first region of said first image comprising said probable location of said first caption; receiving a second image in said video signal wherein said second image is received after said first image; determining a second probable location of a second caption with a second image; analyzing said second image to identify a second region of said second image comprising said probable location of said second caption determining a spatial overlap between said first region and said second region; and generating a data representing said spatial overlap. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19)
-
Specification