Systems, computer-implemented methods, and tangible computer-readable storage media for transcription alignment

US 10,002,612 B2
Filed: 11/14/2016
Issued: 06/19/2018
Est. Priority Date: 08/17/2009
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

identifying a pair of anchor words separated from one another within a media presentation by a time greater than an anchor word time duration requirement;

aligning a transcription of the media presentation with an automatic speech recognition output of the media presentation according to the pair of anchor words to yield an alignment;

generating, by a caption generation module, captions at respective timings within the media presentation based on the alignment to yield generated captions; and

outputting a modified media presentation based on the media presentation and the generated captions.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed herein are systems, computer-implemented methods, and tangible computer-readable storage media for captioning a media presentation. The method includes receiving automatic speech recognition (ASR) output from a media presentation and a transcription of the media presentation. The method includes selecting via a processor a pair of anchor words in the media presentation based on the ASR output and transcription and generating captions by aligning the transcription with the ASR output between the selected pair of anchor words. The transcription can be human-generated. Selecting pairs of anchor words can be based on a similarity threshold between the ASR output and the transcription. In one variation, commonly used words on a stop list are ineligible as anchor words. The method includes outputting the media presentation with the generated captions. The presentation can be a recording of a live event.

4 Citations

20 Claims

1. A method comprising:
- identifying a pair of anchor words separated from one another within a media presentation by a time greater than an anchor word time duration requirement;
  
  aligning a transcription of the media presentation with an automatic speech recognition output of the media presentation according to the pair of anchor words to yield an alignment;
  
  generating, by a caption generation module, captions at respective timings within the media presentation based on the alignment to yield generated captions; and
  
  outputting a modified media presentation based on the media presentation and the generated captions.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein a list of anchor word candidates comprises a stop word list of words not to be considered as the pair of anchor words.
  - 3. The method of claim 1, wherein the anchor word time duration requirement represents a minimal time duration requirement between words in the pair of anchor words.
  - 4. The method of claim 1, wherein identifying the pair of anchor words is achieved by comparing the transcription of the media presentation to a list of anchor work candidates.
  - 5. The method of claim 4, wherein generating the captions further comprises aligning the transcription between the pair of anchor words.
  - 6. The method of claim 1, further comprising:
    - outputting the modified media presentation with the captions.
  - 7. The method of claim 1, wherein the media presentation is in real-time, the method further comprising:
    - buffering the captions based on the media presentation and an aligning of the transcription of the media presentation to yield buffered caption; and
      
      outputting a delayed media presentation and the buffered captions together.
  - 8. The method of claim 1, wherein the media presentation is in real-time, the method further comprising:
    - buffering the media presentation to yield a delayed media presentation; and
      
      outputting the delayed media presentation and the captions together.

9. A system comprising:
- a processor; and
  
  a computer-readable storage device storing instructions which, when executed by the processor, cause the processor to perform operations comprising;
  
  identifying a pair of anchor words separated from one another within a media presentation by a time greater than an anchor word time duration requirement;
  
  aligning a transcription of the media presentation with an automatic speech recognition output of the media presentation according to the pair of anchor words to yield an alignment;
  
  generating by a caption generation module captions at respective timings within the media presentation based on the alignment to yield generated captions; and
  
  outputting a modified media presentation based on the media presentation and the generated captions.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The system of claim 9, wherein a list of anchor word candidates comprises a stop word list of words not to be considered as the pair of anchor words.
  - 11. The system of claim 9, wherein the anchor word time duration requirement represents a minimal time duration requirement between words in the pair of anchor words.
  - 12. The system of claim 9, wherein identifying the pair of anchor words is achieved by comparing the transcription of the media presentation to a list of anchor work candidates.
  - 13. The system of claim 12, wherein generating the captions further comprises aligning the transcription between the pair of anchor words.
  - 14. The system of claim 9, the computer-readable storage device storing further instructions which, when executed by the processor, cause the processor to perform operations comprising:
    - outputting the modified media presentation with the captions.
  - 15. The system of claim 9, wherein the media presentation is in real-time, and wherein the computer-readable storage device stores further instructions which, when executed by the processor, cause the processor to perform operations comprising:
    - buffering the captions based on the media presentation and an aligning of the transcription of the media presentation to yield buffered caption; and
      
      outputting a delayed media presentation and the buffered captions together.
  - 16. The system of claim 9, wherein the media presentation is in real-time, and wherein the computer-readable storage device stores further instructions which, when executed by the processor, cause the processor to perform operations comprising:
    - buffering the media presentation to yield a delayed media presentation; and
      
      outputting the delayed media presentation and the captions together.

17. A computer-readable storage device storing instructions which, when executed by a processor, cause the processor to perform operations comprising:
- identifying a pair of anchor words separated from one another within a media presentation by a time greater than an anchor word time duration requirement;
  
  aligning a transcription of the media presentation with an automatic speech recognition output of the media presentation according to the pair of anchor words to yield an alignment;
  
  generating, by a caption generation module, captions at respective timings within the media presentation based on the alignment to yield generated captions; and
  
  outputting a modified media presentation based on the media presentation and the generated captions.
- View Dependent Claims (18, 19, 20)
- - 18. The computer-readable storage device of claim 17, wherein a list of anchor word candidates further comprises a stop word list of words not to be considered as the pair of anchor words.
  - 19. The computer-readable storage device of claim 17, wherein the anchor word time duration requirement represents a minimal time duration requirement between words in the pair of anchor words.
  - 20. The computer-readable storage device of claim 19, wherein generating the captions further comprises aligning the transcription between the pair of anchor words.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Original Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Inventors
Kim, Yeon-Jun, Gibbon, David C., Schroeter, Horst J.
Primary Examiner(s)
ROBERTS, SHAUN A

Application Number

US15/350,339
Publication Number

US 20170061986A1
Time in Patent Office

582 Days
Field of Search

704231, 704235, 704251, 704271
US Class Current
CPC Class Codes

G10L 13/08   Text analysis or generation...

G10L 15/26   Speech to text systems G10L...

G10L 21/055   for synchronising with othe...

G10L 21/06   Transformation of speech in...

G10L 25/51   for comparison or discrimin...

G11B 27/10   Indexing; Addressing; Timin...

H04M 2201/14   Delay circuits; Timers

H04M 2201/22   Synchronisation circuits

H04M 2201/40   using speech recognition sp...

H04M 2203/305   Recording playback features...

H04M 3/42391   where the subscribers are h...

H04N 21/44004   involving video buffer mana...

H04N 21/4884   for displaying subtitles

Systems, computer-implemented methods, and tangible computer-readable storage media for transcription alignment

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

4 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Systems, computer-implemented methods, and tangible computer-readable storage media for transcription alignment

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

4 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links