Speech recognition based captioning system

US 20020143531A1
Filed: 03/29/2001
Published: 10/03/2002
Est. Priority Date: 03/29/2001
Status: Active Grant

First Claim

Patent Images

1. A method of displaying text information corresponding to a speech portion of audio signals of a television program to as a closed caption on an video display device, the method comprising the steps of:

decoding the audio signals of the television program;

filtering the audio signals to extract the speech portion;

parsing the speech portion into discrete speech components in accordance with a speech model and grouping the parsed speech components;

identifying words in a database corresponding to the grouped speech components; and

converting the identified words into text data for display on the display device as the closed caption.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and associated method of converting audio data from a television signal into textual data for display as a closed caption on an display device is provided. The audio data is decoded and audio speech signals are filtered from the audio data. The audio speech signals are parsed into phonemes in accordance by a speech recognition module. The parsed phonemes are grouped into words and sentences responsive to a database of words corresponding to the grouped phonemes. The words are converted into text data which is formatted for presentation on the display device as closed captioned textual data.

Citations

20 Claims

1. A method of displaying text information corresponding to a speech portion of audio signals of a television program to as a closed caption on an video display device, the method comprising the steps of:
- decoding the audio signals of the television program;
  
  filtering the audio signals to extract the speech portion;
  
  parsing the speech portion into discrete speech components in accordance with a speech model and grouping the parsed speech components;
  
  identifying words in a database corresponding to the grouped speech components; and
  
  converting the identified words into text data for display on the display device as the closed caption.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. A method according to claim 1, wherein the step of filtering the audio signals is performed concurrently with the step of decoding of later-occurring audio signals of the television program and step of parsing of earlier occurring speech signals of the television program.
  - 3. A method according to claim 1, wherein the step of parsing the speech portion into discrete speech components includes the step of employing a speaker independent model to provide individual words as the parsed speech components.
  - 4. A method according to claim 1 further including the step of formatting the text data into lines of text data for display in a closed caption area of the display device.
  - 5. A method according to claim 1, wherein the step of parsing the speech portion into discrete speech components includes the step of employing a speaker dependent model to provide phonemes as the parsed speech components.
  - 6. A method according to claim 5, wherein the speaker dependent model employs a hidden Markov model and the method further comprises the steps of:
    - receiving a training text as a part of the television signal, the training text corresponding to a part of the speech portion of the audio signals;
      
      updating the hidden Markov model based on the training text and the part of the speech portion of the audio signals corresponding to the training text; and
      
      applying the updated hidden Markov model to parse the speech portion of the audio signals to provide the phonemes.

7. A method of displaying text information corresponding to a speech portion of audio signals of a television program to as a closed caption on an video display device, the method comprising the steps of:
- decoding the audio signals of the television program;
  
  filtering the audio signals to extract the speech portion;
  
  receiving a training text as a part of the television signal, the training text corresponding to a part of the speech portion of the audio signals;
  
  generating a hidden Markov model from the training text and the part of the speech portion of the audio signals;
  
  parsing the audio speech signals into phonemes based on the generated Hidden Markov model;
  
  identifying words in a database corresponding to grouped phonemes; and
  
  converting the identified words into text data for presentation on the display of the audio-visual device as closed captioned textual data.
- View Dependent Claims (8, 9, 10)
- - 8. A method according to claim 7, wherein the step of filtering the audio signals is performed concurrently with the step of decoding of later-occurring audio signals of the television program and step of parsing of earlier occurring speech signals of the television program.
  - 9. A method according to claim 7 further including the step of formatting the text data into lines of text data for display in a closed caption area of the display device.
  - 10. A method according to claim 7, further comprising the step of providing respective audio speech signals and training texts for each speaker of a plurality of speakers on the television program.

11. Apparatus for displaying text information corresponding to a speech portion of audio signals of a television program to as a closed caption on an video display device, the method comprising:
- a decoder which separates the audio signals from the television program signals;
  
  a speech filter which identifies portions of the audio signals that include speech components and separates the identified speech component signals from the audio signals;
  
  a phoneme generator which parses the speech portion into phonemes in accordance with a speech model;
  
  a database of words, each word being identified as corresponding to a discrete set of phonemes;
  
  a word matcher which groups the phonemes provided by the phoneme generator and identifies words in the database corresponding to the grouped phonemes; and
  
  a formatting processor that converts the identified words into text data for display on the display device as the closed caption.
- View Dependent Claims (12, 13, 14, 15, 17, 18, 19, 20)
- - 12. Apparatus according to claim 11, wherein the speech filter, the decoder and the phoneme generator are configured to operate in parallel.
  - 13. Apparatus according to claim 11, wherein the phoneme generator includes a speaker independent speech recognition system.
  - 14. Apparatus according to claim 11, wherein the phoneme generator includes a speaker dependent speech recognition system.
  - 15. Apparatus according to claim 14, wherein the speech model includes a hidden Markov model and the phoneme generator further comprises:
    - means for receiving a training text as a part of the television signal, the training text corresponding to a part of the speech portion of the audio signals;
      
      means for updating the hidden Markov model based on the training text and the part of the speech portion of the audio signals corresponding to the training text; and
      
      means for applying the updated hidden Markov model to parse the speech portion of the audio signals to provide the phonemes.
  - 17. A computer readable carrier according to claim 16, wherein the computer program instructions that cause the computer to perform the step of filtering the audio signals are configured to control the computer concurrently with the computer program instructions that cause the computer to perform the step of decoding the audio signals of the television program and with the computer program instructions that cause the computer to perform the step of parsing the speech signals of the television program.
  - 18. A computer readable carrier according to claim 16, wherein the computer program instructions that cause the computer to perform the step of parsing the speech portion into discrete speech components include computer program instructions that cause the computer to use a speaker independent model to provide individual words as the parsed speech components.
  - 19. A computer readable carrier according to claim 16 further including computer program instructions that cause the computer to format the text data into lines of text data for display in a closed caption area of the display device.
  - 20. A computer readable carrier according to claim 16, wherein computer program instructions that cause the computer perform the step of parsing the speech portion into discrete speech components include computer program instructions that cause the computer to use a speaker dependent model to provide phonemes as the parsed speech components.

16. A computer readable carrier including computer program instructions that cause a computer to implement a method for displaying text information corresponding to a speech portion of audio signals of a television program to as a closed caption on an video display device, the method comprising the steps of:
- decoding the audio signals of the television program;
  
  filtering the audio signals to extract the speech portion;
  
  parsing the speech portion into discrete speech components in accordance with a speech model and grouping the parsed speech components;
  
  identifying words in a database corresponding to the grouped speech components; and
  
  converting the identified words into text data for display on the display device as the closed caption.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Matsushita Electric Industrial Company Limited (Panasonic Holdings Corporation)
Original Assignee
Matsushita Electric Industrial Company Limited (Panasonic Holdings Corporation)
Inventors
Kahn, Michael

Granted Patent

US 7,013,273 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/235
CPC Class Codes

G10L 15/26 Speech to text systems G10L...

G10L 2015/025 Phonemes, fenemes or fenone...

Speech recognition based captioning system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition based captioning system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links