Systems, computer-implemented methods, and tangible computer-readable storage media for transcription alignment
First Claim
1. A method comprising:
- identifying a pair of anchor words separated from one another within a media presentation by a time greater than an anchor word time duration requirement;
aligning a transcription of the media presentation with an automatic speech recognition output of the media presentation according to the pair of anchor words to yield an alignment;
generating, by a caption generation module, captions at respective timings within the media presentation based on the alignment to yield generated captions; and
outputting a modified media presentation based on the media presentation and the generated captions.
1 Assignment
0 Petitions
Accused Products
Abstract
Disclosed herein are systems, computer-implemented methods, and tangible computer-readable storage media for captioning a media presentation. The method includes receiving automatic speech recognition (ASR) output from a media presentation and a transcription of the media presentation. The method includes selecting via a processor a pair of anchor words in the media presentation based on the ASR output and transcription and generating captions by aligning the transcription with the ASR output between the selected pair of anchor words. The transcription can be human-generated. Selecting pairs of anchor words can be based on a similarity threshold between the ASR output and the transcription. In one variation, commonly used words on a stop list are ineligible as anchor words. The method includes outputting the media presentation with the generated captions. The presentation can be a recording of a live event.
4 Citations
20 Claims
-
1. A method comprising:
-
identifying a pair of anchor words separated from one another within a media presentation by a time greater than an anchor word time duration requirement; aligning a transcription of the media presentation with an automatic speech recognition output of the media presentation according to the pair of anchor words to yield an alignment; generating, by a caption generation module, captions at respective timings within the media presentation based on the alignment to yield generated captions; and outputting a modified media presentation based on the media presentation and the generated captions. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system comprising:
-
a processor; and a computer-readable storage device storing instructions which, when executed by the processor, cause the processor to perform operations comprising; identifying a pair of anchor words separated from one another within a media presentation by a time greater than an anchor word time duration requirement; aligning a transcription of the media presentation with an automatic speech recognition output of the media presentation according to the pair of anchor words to yield an alignment; generating by a caption generation module captions at respective timings within the media presentation based on the alignment to yield generated captions; and outputting a modified media presentation based on the media presentation and the generated captions. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer-readable storage device storing instructions which, when executed by a processor, cause the processor to perform operations comprising:
-
identifying a pair of anchor words separated from one another within a media presentation by a time greater than an anchor word time duration requirement; aligning a transcription of the media presentation with an automatic speech recognition output of the media presentation according to the pair of anchor words to yield an alignment; generating, by a caption generation module, captions at respective timings within the media presentation based on the alignment to yield generated captions; and outputting a modified media presentation based on the media presentation and the generated captions. - View Dependent Claims (18, 19, 20)
-
Specification