Method, system, and apparatus for facilitating captioning of multi-media content
First Claim
1. A method for creating captions of multi-media content, the method comprising:
- performing an audio analysis operation on an audio signal to produce speech recognition data for each detected utterance, wherein the speech recognition data comprises a plurality of best hypothesis words and corresponding timing information;
displaying the speech recognition data using an operator interface as spoken word suggestions for review by an operator;
enabling the operator to edit the spoken word suggestions within the operator interface, wherein the enabling comprises estimating an appropriate audio portion to be played to the operator at a current moment, based on an indication obtained from the operator interface as to where the operator is currently editing.
1 Assignment
0 Petitions
Accused Products
Abstract
A method, system and apparatus for facilitating transcription and captioning of multi-media content are presented. The method, system, and apparatus include automatic multi-media analysis operations that produce information which is presented to an operator as suggestions for spoken words, spoken word timing, caption segmentation, caption playback timing, caption mark-up such as non-spoken cues or speaker identification, caption formatting, and caption placement. Spoken word suggestions are primarily created through an automatic speech recognition operation, but may be enhanced by leveraging other elements of the multi-media content, such as correlated text and imagery by using text extracted with an optical character recognition operation. Also included is an operator interface that allows the operator to efficiently correct any of the aforementioned suggestions. In the case of word suggestions, in addition to best hypothesis word choices being presented to the operator, alternate word choices are presented for quick selection via the operator interface. Ongoing operator corrections can be leveraged to improve the remaining suggestions. Additionally, an automatic multi-media playback control capability further assists the operator during the correction process.
215 Citations
55 Claims
-
1. A method for creating captions of multi-media content, the method comprising:
-
performing an audio analysis operation on an audio signal to produce speech recognition data for each detected utterance, wherein the speech recognition data comprises a plurality of best hypothesis words and corresponding timing information;
displaying the speech recognition data using an operator interface as spoken word suggestions for review by an operator;
enabling the operator to edit the spoken word suggestions within the operator interface, wherein the enabling comprises estimating an appropriate audio portion to be played to the operator at a current moment, based on an indication obtained from the operator interface as to where the operator is currently editing. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37)
-
-
38. A system for creating captions of multi-media content, the system comprising:
-
means for performing an audio analysis operation on an audio signal to produce speech recognition data for each detected utterance, wherein the speech recognition data comprises a plurality of best hypothesis words and corresponding timing information;
means for displaying the speech recognition data using an operator interface as spoken word suggestions for review by an operator;
means for enabling the operator to edit the spoken word suggestions within the operator interface, wherein the enabling comprises estimating an appropriate audio portion to be played to the operator at a current moment, based on an indication obtained from the operator interface as to where the operator is currently editing. - View Dependent Claims (39, 40, 41, 42)
-
-
43. A computer program product for creating captions of multi-media content, the computer program product comprising:
-
computer code to perform an audio analysis operation on an audio signal to produce speech recognition data for each detected utterance, wherein the speech recognition data comprises a plurality of best hypothesis words and corresponding timing information;
computer code to display the speech recognition data using an operator interface as spoken word suggestions for review by an operator;
computer code to enable the operator to edit the spoken word suggestions within the operator interface, wherein the enabling comprises estimating an appropriate audio portion to be played to the operator at a current moment, based on an indication obtained from the operator interface as to where the operator is currently editing. - View Dependent Claims (44, 45, 46)
-
-
47. A method for facilitating captioning, the method comprising:
-
performing an automatic captioning function on multi-media content, wherein the automatic captioning function creates a machine caption by utilizing speech recognition and optical character recognition on the multi-media content;
providing a caption editor, wherein the caption editor;
includes an operator interface for facilitating an edit of the machine caption by a human operator; and
distributes the edit throughout the machine caption; and
indexing a recognized word to create a searchable caption for use in a multi-media search tool, wherein the multi-media search tool includes a search interface that allows a user to locate relevant content within the multi-media content.
-
-
48. A method for creating machine generated captions of multi-media, the method comprising:
-
performing an optical character recognition operation on a multi-media image, wherein the optical character recognition operation produces text correlated to an audio portion of the multi-media; and
utilizing the correlated text to perform an enhanced audio analysis operation on the multi-media. - View Dependent Claims (49, 50, 51, 52)
-
-
53. A method for creating machine generated captions of multi-media, the method comprising:
-
performing an audio analysis operation on an audio portion of multi-media to produce speech recognition data for each detected utterance, wherein the speech recognition data is correlated to an image based portion of the multi-media;
utilizing the correlated speech recognition data to perform an enhanced optical character recognition operation on the image based portion of the multi-media. - View Dependent Claims (54, 55)
-
Specification