Generation subtitles or captions for moving pictures

US 20040093220A1
Filed: 12/06/2002
Published: 05/13/2004
Est. Priority Date: 06/09/2000
Status: Active Grant

First Claim

Patent Images

1. A method of generating subtitles for audiovisual material, comprising the steps of:

receiving and analysing a text file containing dialogue spoken in the audiovisual material to provide text information signal representative of the text;

aligning the text information and the audio signal from the audiovisual material in time using time alignment speech recognition to provide timing information for the spoken text; and

forming the text information and the timing information into an output subtitle file.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for generating subtitles for audiovisual material received and analyses a text file containing dialogue spoken in audiovisual material and provides a signal representative of the text. The text information and audio signal are aligned in time using time alignment speech recognition and the text and timing information are then output to a subtitle file. Colours can be assigned to different speakers or groups of speakers. Subtitles are derived by receiving and analyzing a text file containing dialogue spoken by considering each word in turn and the next information signal, assigning a score to each subtitle in a plurality of different possible subtitle formatting options which lead to that word. The steps are then repeated until all the words in the text information signal have been used and the subtitle formatting option which gives the best overall score is then derived.

Citations

31 Claims

1. A method of generating subtitles for audiovisual material, comprising the steps of:
- receiving and analysing a text file containing dialogue spoken in the audiovisual material to provide text information signal representative of the text;
  
  aligning the text information and the audio signal from the audiovisual material in time using time alignment speech recognition to provide timing information for the spoken text; and
  
  forming the text information and the timing information into an output subtitle file.
- View Dependent Claims (2, 3, 16, 17, 18, 20)
- - 2. A method according to claim 1, in which the step of analysing the text file comprises calculating with the use of Bayes'"'"' theorem probabilities that each of a plurality of blocks of text is one of a plurality of text component types.
  - 3. A method according to claim 1, in which the step of analysing the text provides a text information signal representative of the text and of the person speaking the text.
  - 16. Apparatus adapted to carry out the method of any one of claims 1 to 15.
  - 17. A computer program arranged when operated to carry out the steps of any one of claims 1 to 15.
  - 18. A subtitle file, or a subtitled audiovisual file, generated by the method of any one of claims 1 to 15.
  - 20. A method according to claim 1, in which the text file is generated by:
    - playing the audio signal from the audiovisual material, the audio signal containing speech;
      
      having a person listen to the speech and speak it into a microphone; and
      
      applying the microphone output signal to a speech recogniser to provide an electronic text file.

4. A method of assigning colour representative of different speakers to subtitles, the method comprising the steps of:
- forming a plurality of groups of speakers, where each group contains speakers who can be represented by the same colour; and
  
  assigning the available colours to a corresponding number of the plurality of groups, the groups being selected such that all the speakers are allocated a colour.
- View Dependent Claims (5, 6, 7, 8)
- - 5. A method according to claim 4, in which the step of forming groups comprises an iterative method in the first step of which each speaker is identified to form a group, in the second step one speaker is taken and allowable combinations with the other groups are formed into additional groups, and in subsequent steps the second step is repeated for each of the groups including all additional groups.
  - 6. A method according to claim 4, in which there is at least one group which contains speakers which are manually assigned one colour, and in which in the assigning step that group is assigned that colour.
  - 7. A method according to claim 4, in which any speaker with interactions with fewer other speakers than there are colours is ignored in the step of forming groups, and is assigned a colour after colours are assigned to the thus-formed groups.
  - 8. A method according to claim 4, substantially as herein described with reference to FIG. 2.

9. A method of detecting scene changes in audiovisual material, comprising the steps of:
- receiving signal representative of the spoken dialogue in the audiovisual material;
  
  identifying the times when speakers are active in the spoken dialogue; and
  
  detecting points in time in the spoken dialogue where the group of active speakers changes.
- View Dependent Claims (10)
- - 10. A method according to claim 9, in which the detecting step includes the step of filtering in time with an averaging function.

11. A method of parsing an electronic text file to identify different components thereof, comprising the steps of:
- identifying blocks of text in an input electronic text file;
  
  providing a plurality of possible script format properties for the blocks;
  
  providing a definition of each of the possible components of the text file;
  
  in relation to each block, determining the value of each script format property;
  
  for each block, determining from the script format properties of the block and the component definitions a probability value that that block is each of the component types;
  
  selecting the component type for each block on the basis of the probabilities that it is each of the component types; and
  
  generating therefrom an output file.
- View Dependent Claims (12, 13, 14)
- - 12. A method according to claim 11, in which the step of determining probability values is undertaken using Bayes'"'"' theorem.
  - 13. A method according to claim 11, in which the output file is input as a new input file and the processing repeated.
  - 14. A method according to claim 11, in which the component definitions are adaptively changeable.

15. A method of placing subtitles related to speech from two speakers in a picture, comprising the steps of:
- generating separate subtitles for the two speakers;
  
  determining from left and right stereo audio signals which of the two speakers is nearer the left and which nearer the right in the picture; and
  
  placing the subtitles for the two speakers in accordance with the determination.

19. A method of generating subtitles for audiovisual material, comprising the steps of:
- playing the audio signal from the audiovisual material, the audio signal containing speech;
  
  having a person listen to the speech and speak it into a microphone;
  
  applying the microphone output signal to a speech recogniser to provide an electronic text signal;
  
  comparing the timings of the audio signal from the audiovisual material and the microphone output signal; and
  
  adjusting the timing of the output of the speech recogniser in dependence upon the comparison so as to tend to align the output of the speech recogniser with the audio signal from the audiovisual material.

21. A method of placing subtitles related to speech from speakers in a moving picture, comprising the steps of:
- receiving a video signal representative of the picture;
  
  analysing the video signal to identify areas of the picture which indicate the presence of a speaker in a location on the picture;
  
  generating therefrom a signal which indicates a desired location for a subtitle relating to speech spoken by that speaker; and
  
  placing the subtitle for that speaker in accordance. therewith.
- View Dependent Claims (22)
- - 22. A method according to claim 21, in which the analysing step comprises identifying faces and/or lip movements.

23. A method of generating subtitles for audiovisual material, comprising the steps of:
- receiving a text signal containing text corresponding to speech in the audiovisual material;
  
  identifying from the audio signal from the audiovisual material predetermined characteristics of the speakers voice;
  
  determining when the characteristics change and, in response thereto, providing an output signal indicating a change of speaker; and
  
  generating from the text signal and the output signal indicating a change of speaker subtitles related to the speech and to the speaker.
- View Dependent Claims (24, 25, 26, 27, 28)
- - 24. A method according to claim 23, further comprising the step of aligning the text signal and the audio signal in time using time alignment speech recognition.
  - 25. A method according to claims 23 or 24, further comprising the step of storing the predetermined characteristics for each speaker.
  - 26. A method according to claim 25, further comprising, when a change of speaker is detected, comparing the characteristics for the new speaker with the stored characteristics to determine whether the new speaker has previously spoken.
  - 27. A method according to claim 26, including the step of generating an output indicative of the number of speakers.
  - 28. A method according to any of claims 23 to 27, in which the step of determining when the characteristics change makes use of punctuation in the text signal.

29. A method of generating subtitles for audiovisual material comprising the steps of:
- receiving and analysing a text file containing dialogue spoken in the audiovisual material to provide a text information signal representative of the text;
  
  deriving a set of subtitles from the text information signal;
  
  characterised in that the deriving step comprises;
  
  a) considering each word in turn in the text information signal;
  
  b) assigning a score to each subtitle in a plurality of different possible subtitle formatting options leading to that word;
  
  c) repeating steps a) and b) until all the words in the text information signal have been used; and
  
  d) deriving the subtitle formatting option that gives the best overall score for the text information signal.
- View Dependent Claims (30, 31)
- - 30. A method according to claim 29, including the step of storing the subtitle formatting option giving the best overall score to at least one selected point in the text and performing step b) only on words added from that at least one selected point.
  - 31. A method according to claim 30 in which the position of the at least one selected point changes position as words are added, thereby reducing the number of subtitle formatting options for which scores must be derived.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
British Broadcasting Corporation
Original Assignee
British Broadcasting Corporation
Inventors
Wiewiorka, Adam, Lahr, William Oscar, Kirby, David Graham, Poole, Christopher Edward

Granted Patent

US 7,191,117 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/278
CPC Class Codes

G06F 40/20   Natural language analysis s...

G10L 15/26   Speech to text systems G10L...

H04N 5/278   Subtitling

Generation subtitles or captions for moving pictures

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

31 Claims

Specification

Solutions

Use Cases

Quick Links

Generation subtitles or captions for moving pictures

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

31 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links