Method for refining time alignments of closed captions
First Claim
Patent Images
1. An apparatus for automatically aligning closed captions, comprising:
- an audio classifier unit, for receiving audio data and identifying portions of the audio data that comprise speech data;
a speech rate control unit, coupled to the audio classifier unit for outputting the portions of the audio data that include speech, adjusts the speech data rate such that an operator can more easily perform transcription;
a time event tracker unit, coupled to the audio speed control unit, for receiving the speech portions of the audio data and for receiving a transcription of the speech portions of the audio data that is generated by the operator, the time event tracker unit also for inserting time stamps in the transcription that indicate the time when portions of the transcription were generated by the operator and the time stamped transcription being output as a roughly aligned closed caption stream; and
a re-aligner unit for precisely aligning the roughly aligned closed caption stream in a non-recursive manner, wherein the re-aligner unit comprises a captions re-aligner unit that receives the roughly aligned closed caption stream and the associated audio data stream and segments both streams into sections based upon a threshold duration between the time stamps in the roughly aligned closed caption stream, the captions re-aligner unit also breaking each section into a number of chunks and generating a language model using only the words contained in each chunk, the captions re-aligner unit using the language model to perform a speech recognition operation on the audio data stream and to generate a hypothesized word list, a plurality of time stamps in the roughly aligned closed caption stream being modified to aligning with a plurality of time stamps in the hypothesized word list.
5 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus are provided for refining time alignments of closed captions. The method automatically aligns closed caption data with associated audio data such that the closed caption data can be more precisely indexed to a requested keyword by a search engine. Further, with such a structure, the closed captions can be made to appear and disappear on a display screen in direct relation to the associated spoken words and phrases. Accordingly, hearing impaired viewers can more easily understand the program that is being displayed.
288 Citations
9 Claims
-
1. An apparatus for automatically aligning closed captions, comprising:
-
an audio classifier unit, for receiving audio data and identifying portions of the audio data that comprise speech data;
a speech rate control unit, coupled to the audio classifier unit for outputting the portions of the audio data that include speech, adjusts the speech data rate such that an operator can more easily perform transcription;
a time event tracker unit, coupled to the audio speed control unit, for receiving the speech portions of the audio data and for receiving a transcription of the speech portions of the audio data that is generated by the operator, the time event tracker unit also for inserting time stamps in the transcription that indicate the time when portions of the transcription were generated by the operator and the time stamped transcription being output as a roughly aligned closed caption stream; and
a re-aligner unit for precisely aligning the roughly aligned closed caption stream in a non-recursive manner, wherein the re-aligner unit comprises a captions re-aligner unit that receives the roughly aligned closed caption stream and the associated audio data stream and segments both streams into sections based upon a threshold duration between the time stamps in the roughly aligned closed caption stream, the captions re-aligner unit also breaking each section into a number of chunks and generating a language model using only the words contained in each chunk, the captions re-aligner unit using the language model to perform a speech recognition operation on the audio data stream and to generate a hypothesized word list, a plurality of time stamps in the roughly aligned closed caption stream being modified to aligning with a plurality of time stamps in the hypothesized word list. - View Dependent Claims (2, 3, 4, 5)
a segmenter unit, coupled to the re-aligner unit for detecting acoustic clues for determining where to break the closed caption stream.
-
-
6. A computer system, comprising:
-
a central processing unit connected to a memory system by a system bus;
an I/O controller, connected to the central processing unit and to the memory system by the system bus;
an audio classifier application, executed by the central processing unit, for receiving audio data and identifying portions of the audio data that comprise speech data;
a speech rate control application, executed by the central processing unit, for outputting the portions of the audio data that include speech at a predetermined rate;
a time event tracker application, executed by the central processing unit, for receiving the speech portions of the audio data from the speech rate control application and for receiving a transcription of the speech portions of the audio data, the time event tracker applications also for inserting time stamps in the transcription that indicate the time when portions of the transcription were received and the time stamped transcription being output as a roughly aligned closed caption stream; and
a re-aligner application, executed by the central processing unit, for precisely aligning the roughly aligned closed caption stream, wherein the re-aligner application comprises a captions re-aligner portion that receives the roughly aligned closed caption stream and the associated audio data stream and segments both streams into sections based upon a threshold duration between time stamps in the roughly aligned closed caption stream, the captions realigner portion also breaking each section into a number of chunks and generating a language model using only the words contained in each chunk, the captions re-aligner portion using the language model to perform a speech recognition operation on the audio data stream and to generate a hypothesized word list, a plurality of time stamps in the roughly aligned closed caption stream being modified to align with a plurality of time stamps in the hypothesized word list. - View Dependent Claims (7, 8, 9)
-
Specification