ALIGNMENT OF CLOSED CAPTIONS

US 20150003797A1
Filed: 06/27/2013
Published: 01/01/2015
Est. Priority Date: 06/27/2013
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving, by a computing device, a video content, wherein the video content comprises audio data, image data, and closed caption data, and wherein the closed caption data comprises a plurality of captions and a corresponding plurality of times during or at which the corresponding plurality of closed captions are to be displayed during presentation of the video content;

prior to presenting video content;

determining, by the computing device based on at least one of the audio data and the image data, one or more first times associated with speech in the video content;

determining, by the computing device based on the closed caption data, one or more second times associated with closed captions in the video content, wherein the one or more second times are selected from the plurality of times included in the closed caption data; and

re-aligning, by the computing device, relative presentation of the speech and the closed captions in the video content, based on the determined first and second times.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In embodiments, apparatuses, methods and storage media are described that are associated with alignment of closed captions. Video content (along with associated audio) may be analyzed to determine various times associated with speech in the video content. The video content may also be analyzed to determine various times associated with closed captions and/or subtitles in the video content. Likelihood values may be associated with the determined times. An alignment may be generated based on these determined times. Multiple techniques may be used, including linear interpolation, non-linear curve fitting, and/or speech recognition matching. Quality metrics may be determined for each of these techniques and then compared. An alignment for the closed captions may be selected from the potential alignments based on the quality metrics. The closed captions and/or subtitles may then be modified based on the selected alignment. Other embodiments may be described and claimed.

Citations

33 Claims

1. A method comprising:
- receiving, by a computing device, a video content, wherein the video content comprises audio data, image data, and closed caption data, and wherein the closed caption data comprises a plurality of captions and a corresponding plurality of times during or at which the corresponding plurality of closed captions are to be displayed during presentation of the video content;
  
  prior to presenting video content;
  
  determining, by the computing device based on at least one of the audio data and the image data, one or more first times associated with speech in the video content;
  
  determining, by the computing device based on the closed caption data, one or more second times associated with closed captions in the video content, wherein the one or more second times are selected from the plurality of times included in the closed caption data; and
  
  re-aligning, by the computing device, relative presentation of the speech and the closed captions in the video content, based on the determined first and second times.
- View Dependent Claims (2, 3, 4, 6, 7, 9, 10, 11, 12, 26)
- - 2. The method of claim 1, wherein determining one or more first times associated with speech comprises determining speech likelihood values for the one or more first times of the video content.
  - 3. The method of claim 2, wherein determining speech likelihood values comprises measuring sound energy of the video content at the one or more first times in a frequency range of human speech.
  - 4. The method of claim 1, wherein determining one or more second times associated with closed captions comprises determining caption likelihood values for the one or more second times of the video content.
  - 6. The method of claim 1, wherein determining caption likelihood values for one or more frames of the video content further comprises:
    - identifying one or more first closed captions that are related to speech;
      
      identifying one or more second closed captions that are not related to speech; and
      
      generating higher caption likelihood values for times corresponding to the one or more first closed captions.
  - 7. The method of claim 6, wherein generating high caption likelihood values comprises:
    - generating high caption likelihood values for times corresponding to initial display of the one or more first closed captions; and
      
      attenuating the caption likelihood values over time periods of display of the one or more first closed captions.
  - 9. The method of claim 1, wherein re-aligning relative presentation of the speech and the closed captions comprises one or more of performing a linear interpolation on the one or more first times and the one or more second times, performing a non-linear curve fitting on the one or more first times and the one or more second times, or matching recognized speech in the video at the one or more first times with text from closed captions at the one or more second times.
  - 10. The method of claim 1, wherein re-aligning relative presentation of the speech and the closed captions comprises:
    - generating multiple potential closed caption alignments;
      
      determining, for each of the potential closed caption alignments, respective quality metrics for the generated potential closed caption alignments; and
      
      selecting a potential closed caption alignment from the generated potential closed caption alignments based at least in part on the determined quality metrics.
  - 11. The method of claim 1, wherein re-aligning relative presentation of the speech and the closed captions further comprises modifying the video content based on the generated alignment by modifying one or more of the plurality of times during or at which the closed captions are to be displayed during presentation of the video content.
  - 12. The method of claim 1, wherein the video content comprises audio content.
  - 26. The method of claim 1, wherein determining speech likelihood values comprises performing face recognition to determine when one or more mouths are moving in the video content.

5. (canceled)

8. (canceled)

13. (canceled)

14. An apparatus comprising:
- one or more computer processors;
  
  a decoder module configured to operate on the one or more computer processors to receive a video content, wherein the video content comprises audio data, image data, and closed caption data, and wherein the closed caption data comprises a plurality of captions and a corresponding plurality of times during or at which the corresponding plurality of closed captions are to be displayed during presentation of the video contenta speech identification module configured to operate on the one or more computer processors to determine, based on at least one of the audio data and the image data prior to presentation of the video content, one or more first times associated with speech in the video content;
  
  a caption identification module configured to operate on the one or more computer processors to determine, based on the closed caption data prior to presentation of the video content, one or more second times associated with closed captions in the video content, wherein the one or more second times are selected from the plurality of times included in the closed caption data; and
  
  an alignment module, operatively coupled to the speech identification module and the caption identification module, and configured to operate on the one or more computer processors to output, to re-align, prior to presentation of the content, relative presentation of the speech and the captioned content in the video content, based on the determined first and second times.
- View Dependent Claims (15, 16, 27, 28, 29)
- - 15. The apparatus of claim 14, wherein determine one or more first times associated with speech comprises determine speech likelihood values for the one or more first times of the video content.
  - 16. The apparatus of claim 15, wherein determine speech likelihood values comprises measure sound energy of the video content at the one or more first times in a frequency range of human speech.
  - 27. The apparatus of claim 14, wherein determine speech likelihood values comprises perform face recognition to determine when one or more mouths are moving in the video content.
  - 28. The apparatus of claim 14, wherein determine speech likelihood values comprises perform speech recognition on the video content.
  - 29. The apparatus of claim 14, wherein the caption identification module is further to:
    - identify one or more first closed captions that are related to speech; and
      
      identify one or more second closed captions that are not related to speech;
      
      wherein the alignment module is to not modify the one or more times in the closed caption data corresponding to the one or more second closed captions.

17. (canceled)

18. One or more non-transitory computer-readable media comprising instructions written thereon that, in response to execution by one or more processing devices of a computing device, cause the computing device to:
- receive a video content, wherein the video content comprises audio data, image data, and closed caption data, and wherein the closed caption data comprises a plurality of captions and a corresponding plurality of times during or at which the corresponding plurality of closed captions are to be displayed during presentation of the video content;
  
  prior to presenting video content;
  
  determine, for the video content based on at least one of the audio data and the image data, one or more first times associated with speech in the video content;
  
  determine, for the video content based on the closed caption data, one or more second times associated with closed captions in the video content, wherein the one or more second times are selected from the plurality of times included in the closed caption data; and
  
  re-align relative presentation of the speech and the closed captions in the video content, based on the determined first and second times.
- View Dependent Claims (19, 20, 30, 31, 32)
- - 19. The non-transitory computer-readable media of claim 18, wherein determine one or more first times associated with speech comprises determine speech likelihood values for the one or more first times of the video content.
  - 20. The non-transitory computer-readable media of claim 19, wherein determine speech likelihood values comprises one or more of measure sound energy of the video content at the one or more first times in a frequency range of human speech.
  - 30. The non-transitory computer-readable media of claim 18, wherein determine speech likelihood values comprises perform face recognition to determine when one or more mouths are moving in the video content.
  - 31. The non-transitory computer-readable media of claim 18, wherein determine speech likelihood values comprises perform speech recognition on the video content.
  - 32. The non-transitory computer-readable media of claim 18, further comprising instructions written thereon that, in response to execution by the one or more processing devices of the computing device, cause the computing device to:
    - identify one or more first closed captions that are related to speech; and
      
      identify one or more second closed captions that are not related to speech;
      
      wherein re-align relative presentation of the speech and the closed captions in the video content comprises not modify the one or more times in the closed caption data corresponding to the one or more second closed captions.

21. (canceled)

22. An apparatus comprising:
- means for receiving a video content, wherein the video content comprises audio data, image data, and closed caption data, and wherein the closed caption data comprises a plurality of captions and a corresponding plurality of times during or at which the corresponding plurality of closed captions are to be displayed during presentation of the video content;
  
  means for determining, based on at least one of the audio data and the image data prior to presenting the video content, one or more first times associated with speech in the video content;
  
  means for determining, based on the closed caption data prior to presenting the video content, one or more second times associated with closed captions in the video content, wherein the one or more second times are selected from the plurality of times included in the closed caption data; and
  
  means for re-aligning relative presentation of the speech and the closed captions in the video content, based on the determined first and second times.
- View Dependent Claims (33)
- - 33. The apparatus of claim 22, further comprising:
    - means for displaying the video content after re-alignment of the relative presentation of the speech and the closed captions in the video content.

23-25. -25. (canceled)

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intel Corporation
Original Assignee
Intel Corporation
Inventors
Schmidt, Johannes P.

Granted Patent

US 8,947,596 B2
Time in Patent Office

Days
Field of Search
US Class Current

386/201
CPC Class Codes

G11B 27/031 Electronic editing of digit...

G11B 27/10 Indexing; Addressing; Timin...

ALIGNMENT OF CLOSED CAPTIONS

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

33 Claims

Specification

Solutions

Use Cases

Quick Links

ALIGNMENT OF CLOSED CAPTIONS

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

33 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links