System and method for extracting text captions from video and generating video summaries

US 8,488,682 B2
Filed: 12/19/2007
Issued: 07/16/2013
Est. Priority Date: 12/06/2001
Status: Expired due to Fees

First Claim

Patent Images

1. A method of decoding a caption box in video content comprising:

determining at least one expected location of a caption box in a frame of the video content;

determining at least one caption box mask within the expected location;

identifying frames in the video content as caption frames if the current frame exhibits substantial correlation to the at least one caption box mask within the expected caption box location;

for at least a portion of the caption frames, identifying word regions within the confines of the expected location;

for each word region,identifying text characters within the region; and

processing the identified text characters.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Caption boxes which are embedded in video content can be located and the text within the caption boxes decoded. Real time processing is enhanced by locating caption box regions in the compressed video domain and performing pixel based processing operations within the region of the video frame in which a caption box is located. The captions boxes are further refined by identifying word regions within the caption boxes and then applying character and word recognition processing to the identified word regions. Domain based models are used to improve text recognition results. The extracted caption box text can be used to detect events of interest in the video content and a semantic model applied to extract a segment of video of the event of interest.

Citations

15 Claims

1. A method of decoding a caption box in video content comprising:
- determining at least one expected location of a caption box in a frame of the video content;
  
  determining at least one caption box mask within the expected location;
  
  identifying frames in the video content as caption frames if the current frame exhibits substantial correlation to the at least one caption box mask within the expected caption box location;
  
  for at least a portion of the caption frames, identifying word regions within the confines of the expected location;
  
  for each word region,identifying text characters within the region; and
  
  processing the identified text characters.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of decoding a caption box according to claim 1 further comprising:
    - comparing the text characters in the word region against a domain specific model to enhance word recognition,determining a word region type for at least one of the identified word regions;
      
      wherein the video content is of a sporting event and wherein the word region types are selected from the group consisting of data points that indicate the current state of the event.
  - 3. The method of decoding a caption box according to claim 2 wherein the data point is a period.
  - 4. The method of decoding a caption box according to claim 2 wherein the data point is a quarter.
  - 5. The method of decoding a caption box according to claim 2 wherein the data point is field position.
  - 6. The method of decoding a caption box according to claim 2 wherein the data point is the number of shots on goal.

7. A non-transitory computer-readable storage medium storing a program for causing a computer to implement a method of decoding a caption box in video content comprising:
- determining at least one expected location of a caption box in a frame of the video content;
  
  determining at least one caption box mask within the expected location;
  
  identifying frames in the video content as caption frames if the current frame exhibits substantial correlation to the at least one caption box mask within the expected caption box location;
  
  for at least a portion of the caption frames, identifying word regions within the confines of the expected location;
  
  for each word region, identifying text characters within the region;
  
  and processing the identified text characters.
- View Dependent Claims (8)
- - 8. The non-transitory computer-readable storage medium of claim 7 wherein the method includes the step of comparing the text characters in the word region against a domain specific model to enhance word recognition.

9. A system for decoding a caption box in video content comprising:
- location means for determining at least one expected location of a caption box in a frame of the video content;
  
  determining means, coupled to the location means and receiving the at least one expected location therefrom, for determining at least one caption box mask within the expected location;
  
  frame identifying means, coupled to the determining means and location means and receiving the at least one caption box mask and the at least one expected location therefrom, for identifying frames in the video content as caption frames if the current frame exhibits substantial correlation to the at least one caption box mask within the expected caption box location;
  
  word region identifying means, coupled to the frame identifying means and receiving the identified caption frames therefrom, for at least a portion of the caption frames, identifying word regions within the confines of the expected location;
  
  text character means, coupled to the word region identifying means and receiving the word regions therefrom, for each word region, identifying text characters within the region; and
  
  processing the identified text characters.
- View Dependent Claims (10, 11, 12, 13, 14, 15)
- - 10. The system of claim 9 wherein the system includes a comparing means, coupled to the text character means and receiving the text characters therefrom, for comparing the text characters in the word region against a domain specific model to enhance word recognition.
  - 11. The system of claim 9 wherein the location means includes a features means for evaluating motion features of the video frame in the compressed domain, evaluating texture features of the video frame in the compressed domain, and identifying regions having low motion features and high texture features as candidate caption box regions.
  - 12. The system of claim 10 wherein the location means includes a features means for evaluating motion features of the video frame in the compressed domain, evaluating texture features of the video frame in the compressed domain, and identifying regions having low motion features and high texture features as candidate caption box regions.
  - 13. The system of claim 9 wherein the system includes a removal means, coupled to the frame identifying means, the determining means and the location means and receiving the identified caption frames, the at least one caption box mask and the at least one expected location therefrom, for evaluating the identified caption frames, within the caption box location, for changes in content and removing caption frames from word region processing which do not exhibit a change in content.
  - 14. The system of claim 9 wherein the system includes an interval means, coupled to the frame identifying means and receiving the identified caption frames therefrom, for selecting a subset of caption frames based on a predetermined time interval and sending that subset to the word region identifying means.
  - 15. The system of claim 9 wherein the system includes a number means, coupled to the frame identifying means and receiving the identified caption frames therefrom, for determining a subset of caption frames by selecting caption frames based on a predetermined number of intervening caption frames and sending that subset to the word region identifying means.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Trustees Of Columbia University In The City Of New York (Columbia University)
Original Assignee
Trustees Of Columbia University In The City Of New York (Columbia University)
Inventors
Chang, Shih-Fu, Zhang, Dongqing
Primary Examiner(s)
DIEP, NHON THANH

Application Number

US11/960,424
Publication Number

US 20080303942A1
Time in Patent Office

2,036 Days
Field of Search

None
US Class Current

375/240.25
CPC Class Codes

G06F 16/739   in form of a video summary,...

G06F 16/7844   using original textual cont...

G06F 16/7857   using texture G06F16/7837 t...

G06F 16/786   using motion, e.g. object m...

G06T 7/75   involving models

G06V 20/635   Overlay text, e.g. embedded...

G06V 30/10   Character recognition

G11B 27/031   Electronic editing of digit...

G11B 27/034   on discs G11B27/036, G11B27...

G11B 27/28   by using information signal...

H04N 21/434   Disassembling of a multiple...

H04N 21/4884   for displaying subtitles

H04N 5/93   Regeneration of the televis...

H04N 7/025   Systems for the transmissio...

H04N 7/0882   for the transmission of cha...

System and method for extracting text captions from video and generating video summaries

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for extracting text captions from video and generating video summaries

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links