Method and apparatus for improving speech recognition and identifying video program material or content
First Claim
1. A system for improving speech recognition of a speech to text converter comprising:
- a frequency translator for receiving an audio signal, wherein an output of the frequency translator shifts a spectrum of the audio signal up or down;
a band pass filter coupled to the output of the frequency translator, wherein the band pass filter provides one or more band limited spectrums of the audio signal;
a distortion generator coupled to an output of the band pass filter;
a speech to text converter coupled to an output of the distortion generator to provide improved speech recognition of the audio signal;
means for receiving a video signal and providing a frequency transformation of DVS, SAP, and or LFE audio signals; and
means for comparing the frequency transformation of the DVS, SAP, and or LFE audio signals to a reference data base of DVS, SAP, and LFE audio signals for identifying the video signal.
10 Assignments
0 Petitions
Accused Products
Abstract
A system for identification of video content in a video signal is provided via a sound track audio signal. The audio signal is processed with filtering, frequency translation, and or non linear transformations to extract voice signals from the sound track channel. The extracted voice signals are coupled to a speech recognition system to provide in text form, the words of the video content, which is later compared with a reference library of words or dialog from known video programs or movies. Other attributes of the video signal or transport stream may be combined with closed caption data or closed caption text for identification purposes. Example attributes include DVS/SAP information, time code information, histograms, and or rendered video or pictures.
-
Citations
6 Claims
-
1. A system for improving speech recognition of a speech to text converter comprising:
-
a frequency translator for receiving an audio signal, wherein an output of the frequency translator shifts a spectrum of the audio signal up or down; a band pass filter coupled to the output of the frequency translator, wherein the band pass filter provides one or more band limited spectrums of the audio signal; a distortion generator coupled to an output of the band pass filter; a speech to text converter coupled to an output of the distortion generator to provide improved speech recognition of the audio signal; means for receiving a video signal and providing a frequency transformation of DVS, SAP, and or LFE audio signals; and means for comparing the frequency transformation of the DVS, SAP, and or LFE audio signals to a reference data base of DVS, SAP, and LFE audio signals for identifying the video signal.
-
-
2. A system for improving speech recognition of a speech to text converter comprising:
-
a frequency translator for receiving an audio signal, wherein an output of the frequency translator shifts a spectrum of the audio signal up or down; a band pass filter coupled to the output of the frequency translator, wherein the band pass filter provides one or more band limited spectrums of the audio signal; a distortion generator coupled to an output of the band pass filter; a speech to text converter coupled to an output of the distortion generator to provide improved speech recognition of the audio signal; a database of rendered movies or video programs which are compared to a received video program material that is rendered for identifying the video program material; and wherein a gradient or Laplacian transform provides the function of rendering.
-
-
3. A system for improving speech recognition of a speech to text converter comprising:
-
a band pass filter receiving an audio signal, wherein the band pass filter provides a band limited spectrum of the audio signal; a frequency translator coupled to an output of the band pass filter, wherein an output of the frequency translator shifts the band limited spectrum of the audio signal up or down; a speech to text converter coupled to the output of the frequency translator to provide improved speech recognition of the band limited spectrum of the audio signal; means for receiving a video program and providing a frequency transformation of DVS, SAP, and or LFE audio signals; and means for comparing the frequency transformation of the DVS, SAP, and or LFE audio signals to a reference data base of DVS, SAP, and LFE audio signals for identifying the video program. - View Dependent Claims (4, 5)
-
-
6. A system for improving speech recognition of a speech to text converter comprising:
-
a band pass filter receiving an audio signal, wherein the band pass filter provides a band limited spectrum of the audio signal; a frequency translator coupled to an output of the band pass filter, wherein an output of the frequency translator shifts the band limited spectrum of the audio signal up or down; a speech to text converter coupled to the output of the frequency translator to provide improved speech recognition of the band limited spectrum of the audio signal; a database of rendered movies or video programs which are compared to a received video program material that is rendered for identifying the video program material; and wherein a gradient or Laplacian transform provides the function of rendering.
-
Specification