ALIGNMENT OF CLOSED CAPTIONS
First Claim
1. A method comprising:
- receiving, by a computing device, a video content, wherein the video content comprises audio data, image data, and closed caption data, and wherein the closed caption data comprises a plurality of captions and a corresponding plurality of times during or at which the corresponding plurality of closed captions are to be displayed during presentation of the video content;
prior to presenting video content;
determining, by the computing device based on at least one of the audio data and the image data, one or more first times associated with speech in the video content;
determining, by the computing device based on the closed caption data, one or more second times associated with closed captions in the video content, wherein the one or more second times are selected from the plurality of times included in the closed caption data; and
re-aligning, by the computing device, relative presentation of the speech and the closed captions in the video content, based on the determined first and second times.
1 Assignment
0 Petitions
Accused Products
Abstract
In embodiments, apparatuses, methods and storage media are described that are associated with alignment of closed captions. Video content (along with associated audio) may be analyzed to determine various times associated with speech in the video content. The video content may also be analyzed to determine various times associated with closed captions and/or subtitles in the video content. Likelihood values may be associated with the determined times. An alignment may be generated based on these determined times. Multiple techniques may be used, including linear interpolation, non-linear curve fitting, and/or speech recognition matching. Quality metrics may be determined for each of these techniques and then compared. An alignment for the closed captions may be selected from the potential alignments based on the quality metrics. The closed captions and/or subtitles may then be modified based on the selected alignment. Other embodiments may be described and claimed.
-
Citations
33 Claims
-
1. A method comprising:
-
receiving, by a computing device, a video content, wherein the video content comprises audio data, image data, and closed caption data, and wherein the closed caption data comprises a plurality of captions and a corresponding plurality of times during or at which the corresponding plurality of closed captions are to be displayed during presentation of the video content; prior to presenting video content; determining, by the computing device based on at least one of the audio data and the image data, one or more first times associated with speech in the video content; determining, by the computing device based on the closed caption data, one or more second times associated with closed captions in the video content, wherein the one or more second times are selected from the plurality of times included in the closed caption data; and re-aligning, by the computing device, relative presentation of the speech and the closed captions in the video content, based on the determined first and second times. - View Dependent Claims (2, 3, 4, 6, 7, 9, 10, 11, 12, 26)
-
-
5. (canceled)
-
8. (canceled)
-
13. (canceled)
-
14. An apparatus comprising:
-
one or more computer processors; a decoder module configured to operate on the one or more computer processors to receive a video content, wherein the video content comprises audio data, image data, and closed caption data, and wherein the closed caption data comprises a plurality of captions and a corresponding plurality of times during or at which the corresponding plurality of closed captions are to be displayed during presentation of the video content a speech identification module configured to operate on the one or more computer processors to determine, based on at least one of the audio data and the image data prior to presentation of the video content, one or more first times associated with speech in the video content; a caption identification module configured to operate on the one or more computer processors to determine, based on the closed caption data prior to presentation of the video content, one or more second times associated with closed captions in the video content, wherein the one or more second times are selected from the plurality of times included in the closed caption data; and an alignment module, operatively coupled to the speech identification module and the caption identification module, and configured to operate on the one or more computer processors to output, to re-align, prior to presentation of the content, relative presentation of the speech and the captioned content in the video content, based on the determined first and second times. - View Dependent Claims (15, 16, 27, 28, 29)
-
-
17. (canceled)
-
18. One or more non-transitory computer-readable media comprising instructions written thereon that, in response to execution by one or more processing devices of a computing device, cause the computing device to:
-
receive a video content, wherein the video content comprises audio data, image data, and closed caption data, and wherein the closed caption data comprises a plurality of captions and a corresponding plurality of times during or at which the corresponding plurality of closed captions are to be displayed during presentation of the video content; prior to presenting video content; determine, for the video content based on at least one of the audio data and the image data, one or more first times associated with speech in the video content; determine, for the video content based on the closed caption data, one or more second times associated with closed captions in the video content, wherein the one or more second times are selected from the plurality of times included in the closed caption data; and re-align relative presentation of the speech and the closed captions in the video content, based on the determined first and second times. - View Dependent Claims (19, 20, 30, 31, 32)
-
-
21. (canceled)
-
22. An apparatus comprising:
-
means for receiving a video content, wherein the video content comprises audio data, image data, and closed caption data, and wherein the closed caption data comprises a plurality of captions and a corresponding plurality of times during or at which the corresponding plurality of closed captions are to be displayed during presentation of the video content; means for determining, based on at least one of the audio data and the image data prior to presenting the video content, one or more first times associated with speech in the video content; means for determining, based on the closed caption data prior to presenting the video content, one or more second times associated with closed captions in the video content, wherein the one or more second times are selected from the plurality of times included in the closed caption data; and means for re-aligning relative presentation of the speech and the closed captions in the video content, based on the determined first and second times. - View Dependent Claims (33)
-
-
23-25. -25. (canceled)
Specification