Speech recognition based captioning system
First Claim
1. A method of displaying text information corresponding to a speech portion of audio signals of a television program to as a closed caption on an video display device, the method comprising the steps of:
- decoding the audio signals of the television program;
filtering the audio signals to extract the speech portion;
parsing the speech portion into discrete speech components in accordance with a speech model and grouping the parsed speech components;
identifying words in a database corresponding to the grouped speech components; and
converting the identified words into text data for display on the display device as the closed caption.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and associated method of converting audio data from a television signal into textual data for display as a closed caption on an display device is provided. The audio data is decoded and audio speech signals are filtered from the audio data. The audio speech signals are parsed into phonemes in accordance by a speech recognition module. The parsed phonemes are grouped into words and sentences responsive to a database of words corresponding to the grouped phonemes. The words are converted into text data which is formatted for presentation on the display device as closed captioned textual data.
-
Citations
20 Claims
-
1. A method of displaying text information corresponding to a speech portion of audio signals of a television program to as a closed caption on an video display device, the method comprising the steps of:
-
decoding the audio signals of the television program;
filtering the audio signals to extract the speech portion;
parsing the speech portion into discrete speech components in accordance with a speech model and grouping the parsed speech components;
identifying words in a database corresponding to the grouped speech components; and
converting the identified words into text data for display on the display device as the closed caption. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method of displaying text information corresponding to a speech portion of audio signals of a television program to as a closed caption on an video display device, the method comprising the steps of:
-
decoding the audio signals of the television program;
filtering the audio signals to extract the speech portion;
receiving a training text as a part of the television signal, the training text corresponding to a part of the speech portion of the audio signals;
generating a hidden Markov model from the training text and the part of the speech portion of the audio signals;
parsing the audio speech signals into phonemes based on the generated Hidden Markov model;
identifying words in a database corresponding to grouped phonemes; and
converting the identified words into text data for presentation on the display of the audio-visual device as closed captioned textual data. - View Dependent Claims (8, 9, 10)
-
-
11. Apparatus for displaying text information corresponding to a speech portion of audio signals of a television program to as a closed caption on an video display device, the method comprising:
-
a decoder which separates the audio signals from the television program signals;
a speech filter which identifies portions of the audio signals that include speech components and separates the identified speech component signals from the audio signals;
a phoneme generator which parses the speech portion into phonemes in accordance with a speech model;
a database of words, each word being identified as corresponding to a discrete set of phonemes;
a word matcher which groups the phonemes provided by the phoneme generator and identifies words in the database corresponding to the grouped phonemes; and
a formatting processor that converts the identified words into text data for display on the display device as the closed caption. - View Dependent Claims (12, 13, 14, 15, 17, 18, 19, 20)
-
-
16. A computer readable carrier including computer program instructions that cause a computer to implement a method for displaying text information corresponding to a speech portion of audio signals of a television program to as a closed caption on an video display device, the method comprising the steps of:
-
decoding the audio signals of the television program;
filtering the audio signals to extract the speech portion;
parsing the speech portion into discrete speech components in accordance with a speech model and grouping the parsed speech components;
identifying words in a database corresponding to the grouped speech components; and
converting the identified words into text data for display on the display device as the closed caption.
-
Specification