Temporally consistent caption detection on videos using a 3D spatiotemporal method

US 20100201871A1
Filed: 02/09/2010
Published: 08/12/2010
Est. Priority Date: 02/10/2009
Status: Active Grant

First Claim

Patent Images

1. An apparatus comprising:

an input for receiving a video signal comprising a first image and a second image wherein said second image is received after said first image;

a processor operative to determine a first probable location of a first caption within a first image, analyze said first image to identify a first region of said first image comprising said probable location of said first caption, determine a second probable location of a second caption with a second image, analyze said second image to identify a second region of said second image comprising said probable location of said second caption, determine a spatial overlap between said first region and said second region, and generate a data representing said spatial overlap.an output for coupling said data to a video processor.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A caption detection system wherein all detected caption boxes over time for one caption area are identical, thereby reducing temporal instability and inconsistency. This is achieved by grouping candidate pixels in the 3D spatiotemporal space and generating a 3D bounding box for one caption area. 2D bounding boxes are obtained by slicing the 3D bounding boxes, thereby reducing temporal instability as all 2D bounding boxes corresponding to a caption area are sliced from one 3D bounding box and are therefore identical over time.

Citations

19 Claims

1. An apparatus comprising:
- an input for receiving a video signal comprising a first image and a second image wherein said second image is received after said first image;
  
  a processor operative to determine a first probable location of a first caption within a first image, analyze said first image to identify a first region of said first image comprising said probable location of said first caption, determine a second probable location of a second caption with a second image, analyze said second image to identify a second region of said second image comprising said probable location of said second caption, determine a spatial overlap between said first region and said second region, and generate a data representing said spatial overlap.an output for coupling said data to a video processor.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The apparatus of claim 1 further comprising a memory for storing data concerning the probable locations of said first caption.
  - 3. The apparatus of claim 2 wherein the data is overwritten with data concerning combined probable locations of said first caption and said second caption.
  - 4. The apparatus of claim 2 wherein said data is stored in said memory and updated with the determination results of each subsequent image received via said video signal, the data representing a two dimensional analysis of the spatial overlap and a temporal representation of the spatial overlap.
  - 5. The apparatus of claim 4 wherein said data is coupled to a video processor when said temporal representation exceeds a threshold.
  - 6. The apparatus of claim 5 wherein exceeding said threshold indicates a high probability of a time continuous caption being located in said spatial overlap.
  - 7. The apparatus of claim 1 wherein said processor generates a plurality of spatial overlap representations, wherein each of said plurality of spatial overlaps is compared to a different threshold, and the combination of said comparisons is used to indicate a high probability of a time continuous caption being located in said spatial overlap.
  - 8. The apparatus of claim 1 wherein said data is a bounding box representing said spatial overlap.
  - 9. The apparatus of claim 1 wherein said each of said first region and said second region are represented as a bounding box and said data represents the spatial overlap of said bounding boxes.

10. A method for processing a video signal comprising the steps of:
- receiving a first image in said video signal;
  
  determining a first probable location of a first caption within a first image;
  
  analyzing said first image to identify a first region of said first image comprising said probable location of said first caption;
  
  receiving a second image in said video signal wherein said second image is received after said first image;
  
  determining a second probable location of a second caption with a second image;
  
  analyzing said second image to identify a second region of said second image comprising said probable location of said second captiondetermining a spatial overlap between said first region and said second region; and
  
  generating a data representing said spatial overlap.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 11. The method of claim 10 wherein said data is stored in a memory and updated with the determination results of each subsequent image received via said video signal, the data representing a two dimensional analysis of the spatial overlap and a temporal representation of the spatial overlap.
  - 12. The method of claim 11 wherein said data is coupled to a video processor when said temporal representation exceeds a threshold.
  - 13. The method of claim 12 wherein exceeding said threshold indicates a high probability of a time continuous caption being located in said spatial overlap.
  - 14. The method of claim 10 wherein a plurality of spatial overlap representations are generated, wherein each of said plurality of spatial overlaps is compared to a different threshold, and the combination of said comparisons is used to indicate a high probability of a time continuous caption being located in said spatial overlap.
  - 15. The method of claim 10 wherein said data is a bounding box representing said spatial overlap.
  - 16. The method of claim 10 wherein said each of said first region and said second region are represented as a bounding box and said data represents the spatial overlap of said bounding boxes.
  - 17. The method of claim 10 further comprising the step of verifying the content of said spatial overlap using a projection profile of a text block image and a plurality of features extracted based on the a local minima of a projection profile.
  - 18. The method of claim 10 further comprising the step of verifying the content of said spatial overlap using a machine learning based classifier to classify a text block image as text image or non-text image.
  - 19. The method of claim 10 wherein said spatial overlap is represented as a plurality of grey scale values, wherein each gray scale value indicates a probability of a pixel within said spatial overlap being a part of a time continuous caption within said spatial overlap

Specification

Resources

Litigation Campaign Assessment

Current Assignee
InterDigital Madison Patent Holdings (InterDigital, Inc.)
Original Assignee
Thomson Licensing (Vantiva SA)
Inventors
Zhang, Dong-Qing, Bhagavathy, Sitaram

Granted Patent

US 8,355,079 B2
Time in Patent Office

Days
Field of Search
US Class Current

348/465
CPC Class Codes

G06V 20/635 Overlay text, e.g. embedded...

Temporally consistent caption detection on videos using a 3D spatiotemporal method

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Temporally consistent caption detection on videos using a 3D spatiotemporal method

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links