Efficient method for producing off-line closed captions

US 6,505,153 B1
Filed: 05/22/2000
Issued: 01/07/2003
Est. Priority Date: 05/22/2000
Status: Expired due to Term

First Claim

Patent Images

1. A method for producing time-aligned transcripts of an audio track, comprising the steps of:

in response to an input audio stream, determining spoken parts of the audio stream;

transcribing the determined spoken parts of the audio stream by using an audio rate control routine, said transcribing producing transcription text;

adding time marks to the transcription text by detecting trigger events based on time of event keystrokes by an operator performing the transcribing; and

re-aligning precisely the transcription text on the input audio stream.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed is a five-step process for producing closed captions for a television program, subtitles for a movie or other uses for time-aligned transcripts. An operator transcribes the audio track while listening to the recorded material. The system helps him/her to work efficiently and produce precisely aligned captions. The first step consists of identifying the portions of the input audio that contain spoken text. Only the spoken parts are further processed by the invention system. The other parts may be used to generate non-spoken captions. The second step controls the rate of speech depending on how fast the operator types. While the operator types, the third module records the time the words were typed in. This provides a rough time alignment for the transcribed text. Then the fourth module realigns precisely the transcribed text on the audio track. A final module segments the transcribed text into captions, based on acoustic clues and natural language constraints. Further, the speech rate-control component of the system may be used in other systems where transcripts are required to be generated from spoken audio.

Citations

15 Claims

1. A method for producing time-aligned transcripts of an audio track, comprising the steps of:
- in response to an input audio stream, determining spoken parts of the audio stream;
  
  transcribing the determined spoken parts of the audio stream by using an audio rate control routine, said transcribing producing transcription text;
  
  adding time marks to the transcription text by detecting trigger events based on time of event keystrokes by an operator performing the transcribing; and
  
  re-aligning precisely the transcription text on the input audio stream.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. A method as claimed in claim 1 further comprising the step of segmenting the realigned transcription text into closed captions.
  - 3. A method as claimed in claim 2 further comprising the step of generating non-speech captions from parts of the input audio stream which were determined to be other than the spoken parts.
  - 4. A method as claimed in claim 2 wherein the segmenting includes:
5. A method as claimed in claim 1 wherein the audio rate control routine includes:
- counting speech units in the spoken parts and producing a count of speech units for a given unit of time;
  
  estimating a speech rate from the count of speech units; and
  
  using the estimated speech rate, controlling playback of the spoken parts of the audio stream to match a target rate.
6. A method as claimed in claim 5 wherein the target rate is about equal to rate of operator operating a keyboard to effect the transcribing.
7. A method as claimed in claim 1 wherein the steps of determining spoken parts, adding time marks and realigning are performed in a digital processor;
- andthe audio rate control routine is performed by a digital processor.
8. A method as claimed in claim 1 wherein the realigning produces time marks correlating character strings of the transcription text to corresponding parts of the input audio stream;
- andfurther comprising the step of using the produced time marks for indexing respective position in time of the audio stream of various character strings in the transcription text, the indexing enabling a search on a desired character string to produce location in the audio stream where the corresponding audio part for the desired character string exists.

9. Apparatus for producing time aligned transcripts of an audio track comprising:
- an audio classifier, in response to an input audio stream, the audio classifier determining spoken parts of the audio stream;
  
  audio rate controller coupled to receive the determined spoken parts of the audio stream from the audio classifier, the audio rate controller controlling rate of playback of the determined spoken parts of the audio stream to a transcriber transcribing the determined spoken parts and producing transcription text;
  
  a time event tracker for adding time marks to the transcription text by detecting trigger events based on time of event keystrokes by the transcriber performing the transcribing;
  
  a realigner responsive to output by the time event tracker, for precisely realigning the transcription text on the input audio stream; and
  
  a segmenter coupled to receive from the realigner the realigned transcription text, the segmenter segmenting the realigned transcription text to form closed captions.
- View Dependent Claims (10, 11, 12, 13, 14, 15)
- - 10. Apparatus as claimed in claim 9 wherein the audio classifier further generates non-speech captions from parts of the input audio stream determined to be other than the spoken parts.
  - 11. Apparatus as claimed in claim 9 wherein the audio rate controller:
12. Apparatus as claimed in claim 11 wherein the target rate is about equal to rate of transcriber operating a keyboard to effect the transcribing.
13. Apparatus as claimed in claim 9 wherein the audio classifier, audio rate controller, time event tracker, realigner and segmenter are executed in a digital processor.
14. Apparatus as claimed in claim 9 wherein the realigner produces time marks correlating character strings of the transcription text to corresponding parts of the input audio stream;
- andthe apparatus further comprises an indexer, said indexer using the produced time marks to index respective position in time of the audio stream of various character strings in the transcription text, such that in response to a search on a desired character string, the indexer produces location in the audio stream where the corresponding audio part for the desired character string exists.
15. Apparatus as claimed in claim 9 wherein the segmenter further:
- detects pauses acoustically;
  
  determines from the detected pauses potential ends of sentences; and
  
  accounts for natural language constraints in the potential ends of sentences to determine legitimate end of sentences, said segmenter segmenting the realigned transcription text according to the determined legitimate end of sentences, to form closed captions.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Data Quill Limited
Original Assignee
Compaq Information Technologies Group LP (HP Inc.)
Inventors
Van Thong, Jean-Manuel, Logan, Beth, Swain, Michael
Primary Examiner(s)
Chawan, Vijay
Assistant Examiner(s)
Lerner, Martin

Application Number

US09/577,054
Time in Patent Office

960 Days
Field of Search

704/211, 704/215, 704/234, 704/235, 704/241, 704/253, 704/254, 704/257, 704/503, 704/270, 348/434.1, 348/462, 348/465, 348/468, 348/473, 348/563, 348/564
US Class Current

704/211
CPC Class Codes

G10L 15/26 Speech to text systems G10L...

H04N 5/278 Subtitling

Efficient method for producing off-line closed captions

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Efficient method for producing off-line closed captions

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links