Method and apparatus for caption detection

US 8,929,461 B2
Filed: 04/17/2007
Issued: 01/06/2015
Est. Priority Date: 04/17/2007
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

detecting a plurality of text boxes from a plurality of video frames, wherein detecting includesobtaining a first percentage of the plurality of text boxes whose locations associated with the plurality of the video frames fall within a location range,wherein the first percentage and the location range are regarded as acceptable if the first percentage is equal to or greater than a first determined value and if the location range is equal to or less than a second predetermined value, andobtaining a second percentage of the plurality of text boxes whose sizes fall within a size range,wherein the second percentage and the size range are regarded as acceptable if the second percentage is equal to or greater than a third predetermined value and if the size range is equal to or less than a fourth predetermined value;

identifying a text box of the plurality of text boxes as a caption candidate if the first percentage, the location range, the second percentage, and the size range relating to the text box are acceptable; and

selecting the identified text box as the caption candidate.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Machine-readable media, methods, apparatus and system for caption detection are described. In some embodiments, a plurality of text boxes may be detected from a plurality of frames. A first percentage of the plurality of text boxes whose locations on the plurality of frames fall into a location range may be obtained. A second percentage of the plurality of text boxes whose sizes fall into a size range may be obtained. Then, it may be determined if the first percentage and the location range are acceptable and if the second percentage and the size range are acceptable.

Citations

20 Claims

1. A method comprising:
- detecting a plurality of text boxes from a plurality of video frames, wherein detecting includesobtaining a first percentage of the plurality of text boxes whose locations associated with the plurality of the video frames fall within a location range,wherein the first percentage and the location range are regarded as acceptable if the first percentage is equal to or greater than a first determined value and if the location range is equal to or less than a second predetermined value, andobtaining a second percentage of the plurality of text boxes whose sizes fall within a size range,wherein the second percentage and the size range are regarded as acceptable if the second percentage is equal to or greater than a third predetermined value and if the size range is equal to or less than a fourth predetermined value;
  
  identifying a text box of the plurality of text boxes as a caption candidate if the first percentage, the location range, the second percentage, and the size range relating to the text box are acceptable; and
  
  selecting the identified text box as the caption candidate.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, further comprising:
    - comparing the first percentage with the first predetermined value;
      
      comparing the location range with the second predetermined value;
      
      comparing the second percentage with the third predetermined value; and
      
      comparing the size range with the fourth predetermined value, wherein the size range includes a height range or a width range.
  - 3. The method of claim 1, wherein obtaining the first percentage comprises:
    - obtaining a plurality of first percentages relating to the plurality of text boxes having a plurality of locations on the plurality of video frames, wherein each of the plurality of first percentages corresponds to each of the plurality of locations;
      
      determining a first percentage range based upon the plurality of first percentages, wherein the first percentage range covers a highest first percentage from the plurality of first percentages;
      
      determining a lowest first percentage from the first percentage range as the first percentage; and
      
      determining the location range based upon the plurality of locations, wherein the location range corresponds to the first percentage range.
  - 4. The method of claim 1, wherein detecting further comprises:
    - defining variable values corresponding to a plurality dimensions for each of the plurality of text boxes;
      
      determining distribution of the plurality of text boxes on each dimension based on its corresponding defined variable value;
      
      finding a peak percentage associated with one or more text boxes of the plurality of text boxes whose corresponding variable values fall within a plurality of dimension ranges,wherein the peak percentage is compared with a predetermined value, and wherein the peak percentage is regarded as acceptable if it is equal to or greater than the predetermined value;
      
      identifying a dimension range of the plurality of dimension ranges as a dedicated dimension range if the peak percentage associated with the dimension range is regarded as acceptable; and
      
      selecting the identified dimension range as the dedicated dimension range.
  - 5. The method of claim 4, further comprising:
    - dividing the variable values into a plurality of bins;
      
      placing each of the plurality of text boxes into one of the plurality of bins corresponding to its variable value, identifying a bin from the plurality of bins into which a highest percentage of text boxes was placed; and
      
      identifying a text box of the plurality of text boxes as a caption candidate if a percentage associated with the text box is a highest percentage and is regarded as acceptable for being equal to or greater than the first determined value or a predetermined threshold percentage.
  - 6. The method of claim 5, further comprising:
    - selecting a range of values within the bin, wherein the range straddles a value corresponding to the highest percentage and wherein ends of the range correspond to a predetermined second percentage of text boxes that is less than the highest percentage, and wherein determining that a text box placed in the bin is a caption is further in response to determining that a width of the range is less than a second threshold value.
  - 7. The method of claim 5, wherein a variable value comprises an x-coordinate of an upper left corner, a y-coordinate of the upper left corner, an x-coordinate of an upper right corner, a y-coordinate of the upper right corner, an x-coordinate of a center point, and a y-coordinate of the center point.

8. An apparatus comprising:
- a computing device having a storage medium to store instructions, and a processing device to execute the instructions, the computing device further having a mechanism to, when the instructions are executed, perform one or more operations comprising;
  
  detecting a plurality of text boxes from a plurality of video frames, wherein detecting includesobtaining a first percentage of the plurality of text boxes whose locations associated with the plurality of the video frames fall within a location range,wherein the first percentage and the location range are regarded as acceptable if the first percentage is equal to or greater than a first determined value and if the location range is equal to or less than a second predetermined value, andobtaining a second percentage of the plurality of text boxes whose sizes fall within a size range,wherein the second percentage and the size range are regarded as acceptable if the second percentage is equal to or greater than a third predetermined value and if the size range is equal to or less than a fourth predetermined value;
  
  identifying a text box of the plurality of text boxes as a caption candidate if the first percentage, the location range, the second percentage, and the size range relating to the text box are acceptable; and
  
  selecting the identified text box as the caption candidate.
- View Dependent Claims (9, 10, 11, 12, 13)
- - 9. The apparatus of claim 8, wherein the one or more operations further comprise:
    - comparing the first percentage with the first predetermined value;
      
      comparing the location range with the second predetermined value;
      
      comparing the second percentage with the third predetermined value; and
      
      comparing the size range with the fourth predetermined value, wherein the size range includes a height range or a width range.
  - 10. The apparatus of claim 8, wherein obtaining the first percentage comprises:
    - obtaining a plurality of first percentages relating to the plurality of text boxes having a plurality of locations on the plurality of video frames, wherein each of the plurality of first percentages corresponds to each of the plurality of locations;
      
      determining a first percentage range based upon the plurality of first percentages, wherein the first percentage range covers a highest first percentage from the plurality of first percentages;
      
      determining a lowest first percentage from the first percentage range as the first percentage; and
      
      determining the location range based upon the plurality of locations, wherein the location range corresponds to the first percentage range.
  - 11. The apparatus of claim 8, wherein when detecting, the one or more operations further comprise:
    - defining variable values corresponding to a plurality dimensions for each of the plurality of text boxes;
      
      determining distribution of the plurality of text boxes on each dimension based on its corresponding defined variable value;
      
      finding a peak percentage associated with one or more text boxes of the plurality of text boxes whose corresponding variable values fall within a plurality of dimension ranges,wherein the peak percentage is compared with a predetermined value, and wherein the peak percentage is regarded as acceptable if it is equal to or greater than the predetermined value;
      
      identifying a dimension range of the plurality of dimension ranges as a dedicated dimension range if the peak percentage associated with the dimension range is regarded as acceptable; and
      
      selecting the identified dimension range as the dedicated dimension range.
  - 12. The apparatus of claim 11, wherein the one or more operations further comprise:
    - dividing the variable values into a plurality of bins;
      
      placing each of the plurality of text boxes into one of the plurality of bins corresponding to its variable value, identifying a bin from the plurality of bins into which a highest percentage of text boxes was placed; and
      
      identifying a text box of the plurality of text boxes as a caption candidate if a percentage associated with the text box is a highest percentage and is regarded as acceptable for being equal to or greater than the first determined value or a predetermined threshold percentage.
  - 13. The apparatus of claim 12, wherein the one or more operations further comprise:
    - selecting a range of values within the bin, wherein the range straddles a value corresponding to the highest percentage and wherein ends of the range correspond to a predetermined second percentage of text boxes that is less than the highest percentage, and wherein determining that a text box placed in the bin is a caption is further in response to determining that a width of the range is less than a second threshold value,wherein a variable value comprises an x-coordinate of an upper left corner, a y-coordinate of the upper left corner, an x-coordinate of an upper right corner, a y-coordinate of the upper right corner, an x-coordinate of a center point, and a y-coordinate of the center point.

14. A machine-readable medium having stored thereon instructions, which when executed, cause a processing device to perform one or more operations comprising:
- detecting a plurality of text boxes from a plurality of video frames, wherein detecting includesobtaining a first percentage of the plurality of text boxes whose locations associated with the plurality of the video frames fall within a location range,wherein the first percentage and the location range are regarded as acceptable if the first percentage is equal to or greater than a first determined value and if the location range is equal to or less than a second predetermined value, andobtaining a second percentage of the plurality of text boxes whose sizes fall within a size range,wherein the second percentage and the size range are regarded as acceptable if the second percentage is equal to or greater than a third predetermined value and if the size range is equal to or less than a fourth predetermined value;
  
  identifying a text box of the plurality of text boxes as a caption candidate if the first percentage, the location range, the second percentage, and the size range relating to the text box are acceptable; and
  
  selecting the identified text box as the caption candidate.
- View Dependent Claims (15, 16, 17, 18, 19, 20)
- - 15. The machine-readable medium of claim 14, wherein the one or more operations further comprise:
    - comparing the first percentage with the first predetermined value;
      
      comparing the location range with the second predetermined value;
      
      comparing the second percentage with the third predetermined value; and
      
      comparing the size range with the fourth predetermined value, wherein the size range includes a height range or a width range.
  - 16. The machine-readable medium of claim 14, wherein obtaining the first percentage comprises:
    - obtaining a plurality of first percentages relating to the plurality of text boxes having a plurality of locations on the plurality of video frames, wherein each of the plurality of first percentages corresponds to each of the plurality of locations;
      
      determining a first percentage range based upon the plurality of first percentages, wherein the first percentage range covers a highest first percentage from the plurality of first percentages;
      
      determining a lowest first percentage from the first percentage range as the first percentage; and
      
      determining the location range based upon the plurality of locations, wherein the location range corresponds to the first percentage range.
  - 17. The machine-readable medium of claim 14, wherein when detecting, the one or more operations further comprise:
    - defining variable values corresponding to a plurality dimensions for each of the plurality of text boxes;
      
      determining distribution of the plurality of text boxes on each dimension based on its corresponding defined variable value;
      
      finding a peak percentage associated with one or more text boxes of the plurality of text boxes whose corresponding variable values fall within a plurality of dimension ranges,wherein the peak percentage is compared with a predetermined value, and wherein the peak percentage is regarded as acceptable if it is equal to or greater than the predetermined value;
      
      identifying a dimension range of the plurality of dimension ranges as a dedicated dimension range if the peak percentage associated with the dimension range is regarded as acceptable; and
      
      selecting the identified dimension range as the dedicated dimension range.
  - 18. The machine-readable medium of claim 17, wherein the one or more operations further comprise:
    - dividing the variable values into a plurality of bins;
      
      placing each of the plurality of text boxes into one of the plurality of bins corresponding to its variable value, identifying a bin from the plurality of bins into which a highest percentage of text boxes was placed; and
      
      identifying a text box of the plurality of text boxes as a caption candidate if a percentage associated with the text box is a highest percentage and is regarded as acceptable for being equal to or greater than the first determined value or a predetermined threshold percentage.
  - 19. The machine-readable medium of claim 18, wherein the one or more operations further comprise:
    - selecting a range of values within the bin, wherein the range straddles a value corresponding to the highest percentage and wherein ends of the range correspond to a predetermined second percentage of text boxes that is less than the highest percentage, and wherein determining that a text box placed in the bin is a caption is further in response to determining that a width of the range is less than a second threshold value.
  - 20. The machine-readable medium of claim 18, wherein a variable value comprises an x-coordinate of an upper left corner, a y-coordinate of the upper left corner, an x-coordinate of an upper right corner, a y-coordinate of the upper right corner, an x-coordinate of a center point, and a y-coordinate of the center point.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intel Corporation
Original Assignee
Intel Corporation
Inventors
Hu, Wei, Ma, Rui
Primary Examiner(s)
Hirl, Joseph P
Assistant Examiner(s)
GYORFI, THOMAS A

Application Number

US11/736,225
Publication Number

US 20080260032A1
Time in Patent Office

2,821 Days
Field of Search

375/240.26, 382/176, 382/182, 382/229
US Class Current

375/240.26
CPC Class Codes

G06V 20/62   Text, e.g. of license plate...

G06V 20/635   Overlay text, e.g. embedded...

H04N 21/4884   for displaying subtitles

Method and apparatus for caption detection

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for caption detection

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links