Method and Apparatus for Identifying Video Program Material or Content via Frequency Translation or Modulation Schemes

US 20120005701A1
Filed: 06/30/2010
Published: 01/05/2012
Est. Priority Date: 06/30/2010
Status: Active Grant

First Claim

Patent Images

1. A system for improving speech recognition of a speech to text converter comprising:

a frequency translator for receiving an audio signal, wherein an output of the frequency translator shifts a spectrum of the audio signal up or down;

a band pass filter coupled to the output of the frequency translator, wherein the band pass filter provides one or more band limited spectrums of the audio signal;

a distortion generator coupled to an output of the band pass filter; and

a speech to text converter coupled to an output of the distortion generator to provide improved speech recognition of the audio signal.

View all claims

10 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system for identification of video content in a video signal is provided via a sound track audio signal. The audio signal is processed with filtering, frequency translation, and or non linear transformations to extract voice signals from the sound track channel. The extracted voice signals are coupled to a speech recognition system to provide in text form, the words of the video content, which is later compared with a reference library of words or dialog from known video programs or movies. Other attributes of the video signal or transport stream may be combined with closed caption data or closed caption text for identification purposes. Example attributes include DVS/SAP information, time code information, histograms, and or rendered video or pictures.

18 Citations

View as Search Results

20 Claims

1. A system for improving speech recognition of a speech to text converter comprising:
- a frequency translator for receiving an audio signal, wherein an output of the frequency translator shifts a spectrum of the audio signal up or down;
  
  a band pass filter coupled to the output of the frequency translator, wherein the band pass filter provides one or more band limited spectrums of the audio signal;
  
  a distortion generator coupled to an output of the band pass filter; and
  
  a speech to text converter coupled to an output of the distortion generator to provide improved speech recognition of the audio signal.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 19, 20)
- - 2. The system of claim 1 further comprising:
    - a time code reader for providing time code from a video program, wherein the time code is associated with the words from a sound track channel, and wherein a comparing module further compares time code and text from the video program with a reference database for identification of the video program.
  - 3. The system of claim 1 further comprising:
    - a processor to provide Closed Caption information from a video program, wherein the Closed Caption information from the video program is compared to a reference data base of Closed Caption information for identifying the video program.
  - 4. The system of claim 1 further comprising:
    - a DVS/SAP audio channel for;
      
      providing text information from an audio portion of a video signal; and
      
      comparing the text information from the DVS/SAP audio channel with a reference data base of text information of DVS/SAP audio signals for identifying the video program.
  - 5. The system of claim 1 further comprising:
    - means for receiving a video signal and providing a frequency transformation of DVS, SAP, and or LFE audio signals; and
      
      means for comparing the frequency transformation of the DVS, SAP, and or LFE audio signals to a reference data base of DVS, SAP, and LFE audio signals for identifying the video program.
  - 6. The system of claim 1 further comprising:
    - a database of rendered movies or video programs which are compared to a received video program material that is rendered for identifying the video program material.
  - 7. The system of claim 6 wherein a gradient or Laplacian transform provides the function of rendering.
  - 19. The system of claim 1 further comprising:
    - a database of rendered movies or video programs which are compared to a received video program material that is rendered for identifying the video program material.
  - 20. The system of claim 19 wherein a gradient or Laplacian transform provides the function of rendering.

8. A method of identifying video program material in a video signal comprising:
- providing a database of DVS/SAP information or text of DVS/SAP information and providing a database of words from sound tracks;
  
  supplying the video signal to a processor/reader, wherein the reader provides processed DVS/SAP information or text of DVS/SAP information, and wherein the reader coupled to the video signal provides words from sound tracks via a filter bank, distortion generators, and a speech to text processor; and
  
  comparing the processed DVS/SAP information or the text of DVS/SAP information to the DVS/SAP information or the text of DVS/SAP information from the database, and comparing with the data base of words from sound tracks to the words of the sound track provided by the video signal to provide identification of the video program material.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The method of claim 8 further comprising:
    - reading time code from the video signal via a time code database linked to the database of the DVS/SAP information or the text of the DVS/SAP information; and
      
      comparing the time code linked to a portion of the DVS/SAP information or text of DVS/SAP information from the database, with the time code linked to a portion of the DVS/SAP information or text of the DVS/SAP information from the processed/read video signal.
  - 10. The method of claim 8 further comprising:
    - providing histogram information of one or more video field or frame which is linked to the DVS/SAP information or text of the DVS/SAP information of the video signal.
  - 11. The method of claim 10 wherein the histogram information includes luminance and or subcarrier phase values.
  - 12. The method of claim 10 wherein the histogram information includes coefficients of Wavelet, Fourier, Cosine, DCT, and or Radon transforms.
  - 13. The method of claim 8 further comprising:
    - providing rendered movies or video programs; and
      
      comparing the rendered movies or video programs with the received video program material that is rendered, for identifying the video program material.
  - 14. The method of claim 13 wherein a gradient or Laplacian transform provides the function of rendering.

15. A system for improving speech recognition of a speech to text converter comprising:
- a band pass filter for receiving an audio signal, wherein the band pass filter provides a band limited spectrum of the audio signal;
  
  a frequency translator coupled to an output of the band pass filter, wherein an output of the frequency translator shifts the band limited spectrum of the audio up or down; and
  
  a speech to text converter coupled to the output of the frequency translator to provide improved speech recognition of the band limited spectrum of the audio signal.
- View Dependent Claims (16, 17, 18)
- - 16. The system of claim 15 further comprising:
    - a time code reader for providing time code from a video program, wherein the time code is associated with the words from a sound track channel, and wherein a comparing module further compares time code and text from the video program with a reference database for identification of the video program.
  - 17. The system of claim 15 further comprising:
    - a processor to provide Closed Caption information from a video program, wherein the Closed Caption information from the video program is compared to a reference data base of Closed Caption information for identifying the video program.
  - 18. The system of claim 15 further comprising:
    - means for receiving a video program and providing a frequency transformation of DVS, SAP, and or LFE audio signals; and
      
      means for comparing the frequency transformation of the DVS, SAP, and or LFE audio signals to a reference data base of DVS, SAP, and LFE audio signals for identifying the video program.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Rovi Technologies Corporation (Adeia Inc.)
Original Assignee
Rovi Technologies Corporation (Adeia Inc.)
Inventors
Quan, Ronald

Granted Patent

US 8,527,268 B2
Time in Patent Office

Days
Field of Search
US Class Current

725/9
CPC Class Codes

G06F 16/7834   using audio features

G06F 16/7844   using original textual cont...

G10L 15/20   Speech recognition techniqu...

G10L 15/26   Speech to text systems G10L...

H04H 60/37   for identifying segments of...

H04H 60/58   of audio determination or d...

Method and Apparatus for Identifying Video Program Material or Content via Frequency Translation or Modulation Schemes

First Claim

10 Assignments

0 Petitions

Accused Products

Abstract

18 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Method and Apparatus for Identifying Video Program Material or Content via Frequency Translation or Modulation Schemes

First Claim

10 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

18 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links