Feature optimization and reliability for audio and video signature generation and detection

US 8,400,566 B2
Filed: 08/17/2009
Issued: 03/19/2013
Est. Priority Date: 08/21/2008
Status: Expired due to Fees

First Claim

Patent Images

1. A method for calculating synchronization errors between destination video content and destination audio content, wherein the method comprises:

receiving reference video signatures representing one or more video features of a reference video signal, reference audio signatures representing one or more audio features of a reference audio signal, and indications of relative temporal alignment of the video and audio features;

generating one or more destination video signatures in response to one or more video features extracted from the destination video content;

generating one or more destination audio signatures in response to one or more audio features extracted from the destination audio content;

comparing a sequence of destination video signatures with a sequence of reference video signatures to find a match between the destination video content and the reference video content used to generate the reference video signatures;

comparing a sequence of destination audio signatures with a sequence of reference audio signatures to find a match between the destination audio content and the reference audio content used to generate the reference audio signatures;

calculating a synchronization error between the destination video content and the destination audio content by calculating a temporal misalignment between the identified destination video content and the identified destination audio content as compared with the relative temporal alignment of the video and audio features of the reference video signal and the reference audio signal;

analyzing past synchronization errors and selecting a model of past synchronization errors that best represents the past synchronization errors, wherein the model may be selected from models that represent a sequence of synchronization errors that are constant, that increase or decrease at a linear rate, or that include an abrupt change in value, and wherein parameters for the selected model are derived to minimize differences between the selected model output and the past synchronization errors;

storing the choice of the selected model and its parameters in a buffer;

deriving a measure of reliability in the synchronization error from a difference between the calculated temporal misalignment and a predicted misalignment obtained from a sequence of previously calculated temporal misalignments;

using the buffer of stored models to predict the synchronization error for intervals where the measures of reliability are below a threshold; and

displaying the synchronization error, or correcting the synchronization error by delaying one or both of the destination video and the destination audio to bring them into proper temporal alignment.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Features are extracted from video and audio content that have a known temporal relationship with one another. The extracted features are used to generate video and audio signatures, which are assembled with an indication of the temporal relationship into a synchronization signature construct. the construct may be used to calculate synchronization errors between video and audio content received at a remote destination. Measures of confidence are generated at the remote destination to optimize processing and to provide an indication of reliability of the calculated synchronization error.

46 Citations

View as Search Results

27 Claims

1. A method for calculating synchronization errors between destination video content and destination audio content, wherein the method comprises:
- receiving reference video signatures representing one or more video features of a reference video signal, reference audio signatures representing one or more audio features of a reference audio signal, and indications of relative temporal alignment of the video and audio features;
  
  generating one or more destination video signatures in response to one or more video features extracted from the destination video content;
  
  generating one or more destination audio signatures in response to one or more audio features extracted from the destination audio content;
  
  comparing a sequence of destination video signatures with a sequence of reference video signatures to find a match between the destination video content and the reference video content used to generate the reference video signatures;
  
  comparing a sequence of destination audio signatures with a sequence of reference audio signatures to find a match between the destination audio content and the reference audio content used to generate the reference audio signatures;
  
  calculating a synchronization error between the destination video content and the destination audio content by calculating a temporal misalignment between the identified destination video content and the identified destination audio content as compared with the relative temporal alignment of the video and audio features of the reference video signal and the reference audio signal;
  
  analyzing past synchronization errors and selecting a model of past synchronization errors that best represents the past synchronization errors, wherein the model may be selected from models that represent a sequence of synchronization errors that are constant, that increase or decrease at a linear rate, or that include an abrupt change in value, and wherein parameters for the selected model are derived to minimize differences between the selected model output and the past synchronization errors;
  
  storing the choice of the selected model and its parameters in a buffer;
  
  deriving a measure of reliability in the synchronization error from a difference between the calculated temporal misalignment and a predicted misalignment obtained from a sequence of previously calculated temporal misalignments;
  
  using the buffer of stored models to predict the synchronization error for intervals where the measures of reliability are below a threshold; and
  
  displaying the synchronization error, or correcting the synchronization error by delaying one or both of the destination video and the destination audio to bring them into proper temporal alignment.
- View Dependent Claims (3, 4, 5, 6, 7, 8)
- - 3. The method of claim 1 or 2 that comprises:
    - deriving a probability of the synchronization error from a statistical model of synchronization errors; and
      
      using the buffer of stored models to predict the synchronization error for intervals where the probability of the synchronization error is below a threshold.
  - 4. The method of claim 1 or 2 that comprises:
    - calculating a measure of confidence for the selected model; and
      
      storing the measure of confidence for the selected model in the buffer along with the choice of the selected model.
  - 5. The method of claim 4 that comprises:
    - storing the choice of the selected model in the buffer if the measure of confidence in the selected model is greater than a threshold.
  - 6. The method of claim 4 that comprises predicting the synchronization error using the stored model having the highest measure of confidence in that selected model.
  - 7. The method of claim 4 that comprises predicting the synchronization error using a model derived from an average of model parameters stored in the buffer.
  - 8. The method of claim 4 that comprises predicting the synchronization error using the model stored most often in the buffer.

2. A method for calculating synchronization errors between destination video content and destination audio content, wherein the method comprises:
- receiving reference video signatures representing one or more video features of a reference video signal, reference audio signatures representing one or more audio features of a reference audio signal, and indications of relative temporal alignment of the video and audio features;
  
  generating one or more destination video signatures in response to one or more video features extracted from the destination video content;
  
  generating one or more destination audio signatures in response to one or more audio features extracted from the destination audio content;
  
  comparing a sequence of destination video signatures with a sequence of reference video signatures to find a match between the destination video content and the reference video content used to generate the reference video signatures, obtaining a relative video timing difference between the destination video signal and the reference video signal;
  
  comparing a sequence of destination audio signatures with a sequence of reference audio signatures to find a match between the destination audio content and the reference audio content used to generate the reference audio signatures, obtaining a relative audio timing difference between the destination audio signal and the reference audio signal;
  
  calculating a synchronization error between the destination video content and the destination audio content from the relative timing difference between the destination video signal and the reference video signal and from the relative timing difference between the destination audio signal and the reference audio signal as compared with the relative temporal alignment of the video and audio features of the reference video signal and the reference audio signal;
  
  calculating a video-match confidence measure that represents a degree of certainty in the found match between the destination video content and the reference video content by analyzing past relative video timing differences and selecting a prediction model that best represents the past relative video timing differences, wherein the model may be selected from models that represent a sequence of timing differences that are constant, that increase or decrease at a linear rate, or that include an abrupt change in value, and wherein parameters for the selected model are derived to minimize differences between the selected model output and the past relative video timing differences;
  
  calculating an audio-match confidence measure that represents a degree of certainty in the found match between the destination audio content and the reference audio content by analyzing past relative audio timing differences and selecting a prediction model that best represents the past relative audio timing differences, wherein the model may be selected from models that represent a sequence of timing differences that are constant, that increase or decrease at a linear rate, or that include an abrupt change in value, and wherein parameters for the selected model are derived to minimize differences between the selected model output and the past relative audio timing differences;
  
  storing the choice of the selected models and their parameters in a buffer;
  
  deriving a measure of reliability in the synchronization error from the video-match confidence measure and the audio-match confidence measure;
  
  using the buffer of stored models to predict the synchronization error for intervals where the measures of reliability are below a threshold; and
  
  displaying the synchronization error, or correcting the synchronization error by delaying one or both of the destination video and the destination audio to bring them into proper temporal alignment.
- View Dependent Claims (9)
- - 9. The method of claim 2 that comprisescalculating a weighted sum of video timing differences, each video timing difference weighted by a respective video-match confidence measure;
    - calculating a weighted sum of audio timing differences, each audio timing difference weighted by a respective audio-match confidence measure; and
      
      calculating the synchronization error from the weighted sum of video timing differences and the weighted sum of audio timing differences.

10. An apparatus for calculating synchronization errors between destination video content and destination audio content, wherein the apparatus comprises:
- a receiver for receiving reference video signatures representing one or more video features of a reference video signal, reference audio signatures representing one or more audio features of a reference audio signal, and indications of relative temporal alignment of the video and audio features;
  
  a generator for generating one or more destination video signatures in response to one or more video features extracted from the destination video content;
  
  a generator for generating one or more destination audio signatures in response to one or more audio features extracted from the destination audio content;
  
  a comparator for comparing a sequence of destination video signatures with a sequence of reference video signatures to find a match between the destination video content and the reference video content used to generate the reference video signatures;
  
  a comparator for comparing a sequence of destination audio signatures with a sequence of reference audio signatures to find a match between the destination audio content and the reference audio content used to generate the reference audio signatures;
  
  a calculator for calculating a synchronization error between the destination video content and the destination audio content by calculating a temporal misalignment between the identified destination video content and the identified destination audio content as compared with the relative temporal alignment of the video and audio features of the reference video signal and the reference audio signal;
  
  an analyzer for analyzing past synchronization errors and for selecting a model of past synchronization errors that best represents the past synchronization errors, wherein the model may be selected from models that represent a sequence of synchronization errors that are constant, that increase or decrease at a linear rate, or that include an abrupt change in value, and wherein parameters for the selected model are derived to minimize differences between the selected model output and the past synchronization errors;
  
  storage for storing the choice of the selected model and its parameters in a buffer;
  
  a deriver for deriving a measure of reliability in the synchronization error from a difference between the calculated temporal misalignment and a predicted misalignment obtained from a sequence of previously calculated temporal misalignments;
  
  a predictor for using the buffer of stored models to predict the synchronization error for intervals where the measures of reliability are below a threshold; and
  
  a display for displaying the synchronization error, or for correcting the synchronization error by delaying one or both of the destination video and the destination audio to bring them into proper temporal alignment.
- View Dependent Claims (12, 13, 14, 15, 16, 17)
- - 12. The apparatus of claim 10 or 11 that comprises:
    - a deriver for deriving a probability of the synchronization error from a statistical model of synchronization errors; and
      
      a predictor for using the buffer of stored models to predict the synchronization error for intervals where the probability of the synchronization error is below a threshold.
  - 13. The apparatus of claim 10 or 11 that comprises:
    - a calculator for calculating a measure of confidence for the selected model; and
      
      storage for storing the measure of confidence for the selected model in the buffer along with the choice of the selected model.
  - 14. The apparatus of claim 13 that comprises storage for storing the choice of the selected model in the buffer if the measure of confidence in the selected model is greater than a threshold.
  - 15. The apparatus of claim 13 that comprises a predictor for predicting the synchronization error using the stored model having the highest measure of confidence in that selected model.
  - 16. The apparatus of claim 13 that comprises a predictor for predicting the synchronization error using a model derived from an average of model parameters stored in the buffer.
  - 17. The apparatus of claim 13 that comprises a predictor for predicting the synchronization error using the model stored most often in the buffer.

11. An apparatus for calculating synchronization errors between destination video content and destination audio content, wherein the apparatus comprises:
- a receiver for receiving reference video signatures representing one or more video features of a reference video signal, reference audio signatures representing one or more audio features of a reference audio signal, and indications of relative temporal alignment of the video and audio features;
  
  a generator for generating one or more destination video signatures in response to one or more video features extracted from the destination video content;
  
  a generator for generating one or more destination audio signatures in response to one or more audio features extracted from the destination audio content;
  
  a comparator for comparing a sequence of destination video signatures with a sequence of reference video signatures to find a match between the destination video content and the reference video content used to generate the reference video signatures, obtaining a relative video timing difference between the destination video signal and the reference video signal;
  
  a comparator for comparing a sequence of destination audio signatures with a sequence of reference audio signatures to find a match between the destination audio content and the reference audio content used to generate the reference audio signatures, obtaining a relative audio timing difference between the destination audio signal and the reference audio signal;
  
  a calculator for calculating a synchronization error between the destination video content and the destination audio content from the relative timing difference between the destination video signal and the reference video signal and from the relative timing difference between the destination audio signal and the reference audio signal as compared with the relative temporal alignment of the video and audio features of the reference video signal and the reference audio signal;
  
  a calculator for calculating a video-match confidence measure that represents a degree of certainty in the found match between the destination video content and the reference video content by analyzing past relative video timing differences and selecting a prediction model that best represents the past relative video timing differences, wherein the model may be selected from models that represent a sequence of timing differences that are constant, that increase or decrease at a linear rate, or that include an abrupt change in value, and wherein parameters for the selected model are derived to minimize differences between the selected model output and the past relative video timing differences;
  
  a calculator for calculating an audio-match confidence measure that represents a degree of certainty in the found match between the destination audio content and the reference audio content by analyzing past relative audio timing differences and selecting a prediction model that best represents the past relative audio timing differences, wherein the model may be selected from models that represent a sequence of timing differences that are constant, that increase or decrease at a linear rate, or that include an abrupt change in value, and wherein parameters for the selected model are derived to minimize differences between the selected model output and the past relative audio timing differences;
  
  storage for storing the choice of the selected models and their parameters in a buffer;
  
  a deriver for deriving a measure of reliability in the synchronization error from the video-match confidence measure and the audio-match confidence measure;
  
  a predictor for using the buffer of stored models to predict the synchronization error for intervals where the measures of reliability are below a threshold; and
  
  a display for displaying the synchronization error, or for correcting the synchronization error by delaying one or both of the destination video and the destination audio to bring them into proper temporal alignment.
- View Dependent Claims (18)
- - 18. The apparatus of claim 11 that comprises:
    - a calculator for calculating a weighted sum of video timing differences, each video timing difference weighted by a respective video-match confidence measure;
      
      a calculator for calculating a weighted sum of audio timing differences, each audio timing difference weighted by a respective audio-match confidence measure; and
      
      a calculator for calculating the synchronization error from the weighted sum of video timing differences and the weighted sum of audio timing differences.

19. A non-transitory medium that stores a program of instructions and is readable by a computer for executing the program of instructions to perform a method for calculating synchronization errors between destination video content and destination audio content, wherein the method comprises:
- receiving reference video signatures representing one or more video features of a reference video signal, reference audio signatures representing one or more audio features of a reference audio signal, and indications of relative temporal alignment of the video and audio features;
  
  generating one or more destination video signatures in response to one or more video features extracted from the destination video content;
  
  generating one or more destination audio signatures in response to one or more audio features extracted from the destination audio content;
  
  comparing a sequence of destination video signatures with a sequence of reference video signatures to find a match between the destination video content and the reference video content used to generate the reference video signatures;
  
  comparing a sequence of destination audio signatures with a sequence of reference audio signatures to find a match between the destination audio content and the reference audio content used to generate the reference audio signatures;
  
  calculating a synchronization error between the destination video content and the destination audio content by calculating a temporal misalignment between the identified destination video content and the identified destination audio content as compared with the relative temporal alignment of the video and audio features of the reference video signal and the reference audio signal;
  
  analyzing past synchronization errors and selecting a model of past synchronization errors that best represents the past synchronization errors, wherein the model may be selected from models that represent a sequence of synchronization errors that are constant, that increase or decrease at a linear rate, or that include an abrupt change in value, and wherein parameters for the selected model are derived to minimize differences between the selected model output and the past synchronization errors;
  
  storing the choice of the selected model and its parameters in a buffer;
  
  deriving a measure of reliability in the synchronization error from a difference between the calculated temporal misalignment and a predicted misalignment obtained from a sequence of previously calculated temporal misalignments;
  
  using the buffer of stored models to predict the synchronization error for intervals where the measures of reliability are below a threshold; and
  
  displaying the synchronization error, or correcting the synchronization error by delaying one or both of the destination video and the destination audio to bring them into proper temporal alignment.
- View Dependent Claims (21, 22, 23, 24, 25, 26)
- - 21. The medium of claim 19 or 20, wherein the method comprises:
    - deriving a probability of the synchronization error from a statistical model of synchronization errors; and
      
      using the buffer of stored models to predict the synchronization error for intervals where the probability of the synchronization error is below a threshold.
  - 22. The medium of claim 19 or 20, wherein the method comprises:
    - calculating a measure of confidence for the selected model; and
      
      storing the measure of confidence for the selected model in the buffer along with the choice of the selected model.
  - 23. The medium of claim 22, wherein the method comprises storing the choice of the selected model in the buffer if the measure of confidence in the selected model is greater than a threshold.
  - 24. The medium of claim 22, wherein the method comprises predicting the synchronization error using the stored model having the highest measure of confidence in that selected model.
  - 25. The medium of claim 22, wherein the method comprises predicting the synchronization error using a model derived from an average of model parameters stored in the buffer.
  - 26. The medium of claim 22, wherein the method comprises predicting the synchronization error using the model stored most often in the buffer.

20. A non-transitory medium that stores a program of instructions and is readable by a computer for executing the program of instructions to perform a method for calculating synchronization errors between destination video content and destination audio content, wherein the method comprises:
- receiving reference video signatures representing one or more video features of a reference video signal, reference audio signatures representing one or more audio features of a reference audio signal, and indications of relative temporal alignment of the video and audio features;
  
  generating one or more destination video signatures in response to one or more video features extracted from the destination video content;
  
  generating one or more destination audio signatures in response to one or more audio features extracted from the destination audio content;
  
  comparing a sequence of destination video signatures with a sequence of reference video signatures to find a match between the destination video content and the reference video content used to generate the reference video signatures, obtaining a relative video timing difference between the destination video signal and the reference video signal;
  
  comparing a sequence of destination audio signatures with a sequence of reference audio signatures to find a match between the destination audio content and the reference audio content used to generate the reference audio signatures, obtaining a relative audio timing difference between the destination audio signal and the reference audio signal;
  
  calculating a synchronization error between the destination video content and the destination audio content from the relative timing difference between the destination video signal and the reference video signal and from the relative timing difference between the destination audio signal and the reference audio signal as compared with the relative temporal alignment of the video and audio features of the reference video signal and the reference audio signal;
  
  calculating a video-match confidence measure that represents a degree of certainty in the found match between the destination video content and the reference video content by analyzing past relative video timing differences and selecting a prediction model that best represents the past relative video timing differences, wherein the model may be selected from models that represent a sequence of timing differences that are constant, that increase or decrease at a linear rate, or that include an abrupt change in value, and wherein parameters for the selected model are derived to minimize differences between the selected model output and the past relative video timing differences;
  
  calculating an audio-match confidence measure that represents a degree of certainty in the found match between the destination audio content and the reference audio content by analyzing past relative audio timing differences and selecting a prediction model that best represents the past relative audio timing differences, wherein the model may be selected from models that represent a sequence of timing differences that are constant, that increase or decrease at a linear rate, or that include an abrupt change in value, and wherein parameters for the selected model are derived to minimize differences between the selected model output and the past relative audio timing differences;
  
  storing the choice of the selected models and their parameters in a buffer;
  
  deriving a measure of reliability in the synchronization error from the video-match confidence measure and the audio-match confidence measure;
  
  using the buffer of stored models to predict the synchronization error for intervals where the measures of reliability are below a threshold; and
  
  displaying the synchronization error, or correcting the synchronization error by delaying one or both of the destination video and the destination audio to bring them into proper temporal alignment.
- View Dependent Claims (27)
- - 27. The medium of claim 20, wherein the method comprises:
    - calculating a weighted sum of video timing differences, each video timing difference weighted by a respective video-match confidence measure;
      
      calculating a weighted sum of audio timing differences, each audio timing difference weighted by a respective audio-match confidence measure; and
      
      calculating the synchronization error from the weighted sum of video timing differences and the weighted sum of audio timing differences.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Dolby Laboratories Licensing Corporation (Dolby Laboratories Incorporated)
Original Assignee
Dolby Laboratories Licensing Corporation (Dolby Laboratories Incorporated)
Inventors
Terry, Kent Bennett, Radhakrishnan, Regunathan
Primary Examiner(s)
KOSTAK, VICTOR R

Application Number

US13/059,468
Publication Number

US 20110261257A1
Time in Patent Office

1,310 Days
Field of Search

348/515, 348/512, 348/192, 348/194, 348/180, 348/181, 375/224, 375/226, 375/371, 375/240.28, 714/732, 714/744, 714819-821, 702/66, 702/71, 702/72, 702/122, 709/248
US Class Current

348/515
CPC Class Codes

G06F 16/7847   using low-level visual feat...

G06V 20/46   Extracting features or char...

G06V 20/48   Matching video sequences

G06V 40/16   Human faces, e.g. facial pa...

H04N 21/23418   involving operations for an...

H04N 21/234318   by decomposing into objects...

H04N 21/2368   Multiplexing of audio and v...

H04N 21/242   Synchronization processes, ...

H04N 21/4305   Synchronising client clock ...

H04N 21/43072   of multiple content streams...

H04N 21/4341   Demultiplexing of audio and...

H04N 21/4394   involving operations for an...

H04N 21/44008   involving operations for an...

H04N 5/04   Synchronising for televisio...

Feature optimization and reliability for audio and video signature generation and detection

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

46 Citations

27 Claims

Specification

Solutions

Use Cases

Quick Links

Feature optimization and reliability for audio and video signature generation and detection

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

46 Citations

27 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links