Feature optimization and reliability for audio and video signature generation and detection
First Claim
Patent Images
1. A method for calculating synchronization errors between destination video content and destination audio content, wherein the method comprises:
- receiving reference video signatures representing one or more video features of a reference video signal, reference audio signatures representing one or more audio features of a reference audio signal, and indications of relative temporal alignment of the video and audio features;
generating one or more destination video signatures in response to one or more video features extracted from the destination video content;
generating one or more destination audio signatures in response to one or more audio features extracted from the destination audio content;
comparing a sequence of destination video signatures with a sequence of reference video signatures to find a match between the destination video content and the reference video content used to generate the reference video signatures;
comparing a sequence of destination audio signatures with a sequence of reference audio signatures to find a match between the destination audio content and the reference audio content used to generate the reference audio signatures;
calculating a synchronization error between the destination video content and the destination audio content by calculating a temporal misalignment between the identified destination video content and the identified destination audio content as compared with the relative temporal alignment of the video and audio features of the reference video signal and the reference audio signal;
analyzing past synchronization errors and selecting a model of past synchronization errors that best represents the past synchronization errors, wherein the model may be selected from models that represent a sequence of synchronization errors that are constant, that increase or decrease at a linear rate, or that include an abrupt change in value, and wherein parameters for the selected model are derived to minimize differences between the selected model output and the past synchronization errors;
storing the choice of the selected model and its parameters in a buffer;
deriving a measure of reliability in the synchronization error from a difference between the calculated temporal misalignment and a predicted misalignment obtained from a sequence of previously calculated temporal misalignments;
using the buffer of stored models to predict the synchronization error for intervals where the measures of reliability are below a threshold; and
displaying the synchronization error, or correcting the synchronization error by delaying one or both of the destination video and the destination audio to bring them into proper temporal alignment.
1 Assignment
0 Petitions
Accused Products
Abstract
Features are extracted from video and audio content that have a known temporal relationship with one another. The extracted features are used to generate video and audio signatures, which are assembled with an indication of the temporal relationship into a synchronization signature construct. the construct may be used to calculate synchronization errors between video and audio content received at a remote destination. Measures of confidence are generated at the remote destination to optimize processing and to provide an indication of reliability of the calculated synchronization error.
46 Citations
27 Claims
-
1. A method for calculating synchronization errors between destination video content and destination audio content, wherein the method comprises:
-
receiving reference video signatures representing one or more video features of a reference video signal, reference audio signatures representing one or more audio features of a reference audio signal, and indications of relative temporal alignment of the video and audio features; generating one or more destination video signatures in response to one or more video features extracted from the destination video content; generating one or more destination audio signatures in response to one or more audio features extracted from the destination audio content; comparing a sequence of destination video signatures with a sequence of reference video signatures to find a match between the destination video content and the reference video content used to generate the reference video signatures; comparing a sequence of destination audio signatures with a sequence of reference audio signatures to find a match between the destination audio content and the reference audio content used to generate the reference audio signatures; calculating a synchronization error between the destination video content and the destination audio content by calculating a temporal misalignment between the identified destination video content and the identified destination audio content as compared with the relative temporal alignment of the video and audio features of the reference video signal and the reference audio signal; analyzing past synchronization errors and selecting a model of past synchronization errors that best represents the past synchronization errors, wherein the model may be selected from models that represent a sequence of synchronization errors that are constant, that increase or decrease at a linear rate, or that include an abrupt change in value, and wherein parameters for the selected model are derived to minimize differences between the selected model output and the past synchronization errors; storing the choice of the selected model and its parameters in a buffer; deriving a measure of reliability in the synchronization error from a difference between the calculated temporal misalignment and a predicted misalignment obtained from a sequence of previously calculated temporal misalignments; using the buffer of stored models to predict the synchronization error for intervals where the measures of reliability are below a threshold; and displaying the synchronization error, or correcting the synchronization error by delaying one or both of the destination video and the destination audio to bring them into proper temporal alignment. - View Dependent Claims (3, 4, 5, 6, 7, 8)
-
-
2. A method for calculating synchronization errors between destination video content and destination audio content, wherein the method comprises:
-
receiving reference video signatures representing one or more video features of a reference video signal, reference audio signatures representing one or more audio features of a reference audio signal, and indications of relative temporal alignment of the video and audio features; generating one or more destination video signatures in response to one or more video features extracted from the destination video content; generating one or more destination audio signatures in response to one or more audio features extracted from the destination audio content; comparing a sequence of destination video signatures with a sequence of reference video signatures to find a match between the destination video content and the reference video content used to generate the reference video signatures, obtaining a relative video timing difference between the destination video signal and the reference video signal; comparing a sequence of destination audio signatures with a sequence of reference audio signatures to find a match between the destination audio content and the reference audio content used to generate the reference audio signatures, obtaining a relative audio timing difference between the destination audio signal and the reference audio signal; calculating a synchronization error between the destination video content and the destination audio content from the relative timing difference between the destination video signal and the reference video signal and from the relative timing difference between the destination audio signal and the reference audio signal as compared with the relative temporal alignment of the video and audio features of the reference video signal and the reference audio signal; calculating a video-match confidence measure that represents a degree of certainty in the found match between the destination video content and the reference video content by analyzing past relative video timing differences and selecting a prediction model that best represents the past relative video timing differences, wherein the model may be selected from models that represent a sequence of timing differences that are constant, that increase or decrease at a linear rate, or that include an abrupt change in value, and wherein parameters for the selected model are derived to minimize differences between the selected model output and the past relative video timing differences; calculating an audio-match confidence measure that represents a degree of certainty in the found match between the destination audio content and the reference audio content by analyzing past relative audio timing differences and selecting a prediction model that best represents the past relative audio timing differences, wherein the model may be selected from models that represent a sequence of timing differences that are constant, that increase or decrease at a linear rate, or that include an abrupt change in value, and wherein parameters for the selected model are derived to minimize differences between the selected model output and the past relative audio timing differences; storing the choice of the selected models and their parameters in a buffer; deriving a measure of reliability in the synchronization error from the video-match confidence measure and the audio-match confidence measure; using the buffer of stored models to predict the synchronization error for intervals where the measures of reliability are below a threshold; and displaying the synchronization error, or correcting the synchronization error by delaying one or both of the destination video and the destination audio to bring them into proper temporal alignment. - View Dependent Claims (9)
-
-
10. An apparatus for calculating synchronization errors between destination video content and destination audio content, wherein the apparatus comprises:
-
a receiver for receiving reference video signatures representing one or more video features of a reference video signal, reference audio signatures representing one or more audio features of a reference audio signal, and indications of relative temporal alignment of the video and audio features; a generator for generating one or more destination video signatures in response to one or more video features extracted from the destination video content; a generator for generating one or more destination audio signatures in response to one or more audio features extracted from the destination audio content; a comparator for comparing a sequence of destination video signatures with a sequence of reference video signatures to find a match between the destination video content and the reference video content used to generate the reference video signatures; a comparator for comparing a sequence of destination audio signatures with a sequence of reference audio signatures to find a match between the destination audio content and the reference audio content used to generate the reference audio signatures; a calculator for calculating a synchronization error between the destination video content and the destination audio content by calculating a temporal misalignment between the identified destination video content and the identified destination audio content as compared with the relative temporal alignment of the video and audio features of the reference video signal and the reference audio signal; an analyzer for analyzing past synchronization errors and for selecting a model of past synchronization errors that best represents the past synchronization errors, wherein the model may be selected from models that represent a sequence of synchronization errors that are constant, that increase or decrease at a linear rate, or that include an abrupt change in value, and wherein parameters for the selected model are derived to minimize differences between the selected model output and the past synchronization errors; storage for storing the choice of the selected model and its parameters in a buffer; a deriver for deriving a measure of reliability in the synchronization error from a difference between the calculated temporal misalignment and a predicted misalignment obtained from a sequence of previously calculated temporal misalignments; a predictor for using the buffer of stored models to predict the synchronization error for intervals where the measures of reliability are below a threshold; and a display for displaying the synchronization error, or for correcting the synchronization error by delaying one or both of the destination video and the destination audio to bring them into proper temporal alignment. - View Dependent Claims (12, 13, 14, 15, 16, 17)
-
-
11. An apparatus for calculating synchronization errors between destination video content and destination audio content, wherein the apparatus comprises:
-
a receiver for receiving reference video signatures representing one or more video features of a reference video signal, reference audio signatures representing one or more audio features of a reference audio signal, and indications of relative temporal alignment of the video and audio features; a generator for generating one or more destination video signatures in response to one or more video features extracted from the destination video content; a generator for generating one or more destination audio signatures in response to one or more audio features extracted from the destination audio content; a comparator for comparing a sequence of destination video signatures with a sequence of reference video signatures to find a match between the destination video content and the reference video content used to generate the reference video signatures, obtaining a relative video timing difference between the destination video signal and the reference video signal; a comparator for comparing a sequence of destination audio signatures with a sequence of reference audio signatures to find a match between the destination audio content and the reference audio content used to generate the reference audio signatures, obtaining a relative audio timing difference between the destination audio signal and the reference audio signal; a calculator for calculating a synchronization error between the destination video content and the destination audio content from the relative timing difference between the destination video signal and the reference video signal and from the relative timing difference between the destination audio signal and the reference audio signal as compared with the relative temporal alignment of the video and audio features of the reference video signal and the reference audio signal; a calculator for calculating a video-match confidence measure that represents a degree of certainty in the found match between the destination video content and the reference video content by analyzing past relative video timing differences and selecting a prediction model that best represents the past relative video timing differences, wherein the model may be selected from models that represent a sequence of timing differences that are constant, that increase or decrease at a linear rate, or that include an abrupt change in value, and wherein parameters for the selected model are derived to minimize differences between the selected model output and the past relative video timing differences; a calculator for calculating an audio-match confidence measure that represents a degree of certainty in the found match between the destination audio content and the reference audio content by analyzing past relative audio timing differences and selecting a prediction model that best represents the past relative audio timing differences, wherein the model may be selected from models that represent a sequence of timing differences that are constant, that increase or decrease at a linear rate, or that include an abrupt change in value, and wherein parameters for the selected model are derived to minimize differences between the selected model output and the past relative audio timing differences; storage for storing the choice of the selected models and their parameters in a buffer; a deriver for deriving a measure of reliability in the synchronization error from the video-match confidence measure and the audio-match confidence measure; a predictor for using the buffer of stored models to predict the synchronization error for intervals where the measures of reliability are below a threshold; and a display for displaying the synchronization error, or for correcting the synchronization error by delaying one or both of the destination video and the destination audio to bring them into proper temporal alignment. - View Dependent Claims (18)
-
-
19. A non-transitory medium that stores a program of instructions and is readable by a computer for executing the program of instructions to perform a method for calculating synchronization errors between destination video content and destination audio content, wherein the method comprises:
-
receiving reference video signatures representing one or more video features of a reference video signal, reference audio signatures representing one or more audio features of a reference audio signal, and indications of relative temporal alignment of the video and audio features; generating one or more destination video signatures in response to one or more video features extracted from the destination video content; generating one or more destination audio signatures in response to one or more audio features extracted from the destination audio content; comparing a sequence of destination video signatures with a sequence of reference video signatures to find a match between the destination video content and the reference video content used to generate the reference video signatures; comparing a sequence of destination audio signatures with a sequence of reference audio signatures to find a match between the destination audio content and the reference audio content used to generate the reference audio signatures; calculating a synchronization error between the destination video content and the destination audio content by calculating a temporal misalignment between the identified destination video content and the identified destination audio content as compared with the relative temporal alignment of the video and audio features of the reference video signal and the reference audio signal; analyzing past synchronization errors and selecting a model of past synchronization errors that best represents the past synchronization errors, wherein the model may be selected from models that represent a sequence of synchronization errors that are constant, that increase or decrease at a linear rate, or that include an abrupt change in value, and wherein parameters for the selected model are derived to minimize differences between the selected model output and the past synchronization errors; storing the choice of the selected model and its parameters in a buffer; deriving a measure of reliability in the synchronization error from a difference between the calculated temporal misalignment and a predicted misalignment obtained from a sequence of previously calculated temporal misalignments; using the buffer of stored models to predict the synchronization error for intervals where the measures of reliability are below a threshold; and displaying the synchronization error, or correcting the synchronization error by delaying one or both of the destination video and the destination audio to bring them into proper temporal alignment. - View Dependent Claims (21, 22, 23, 24, 25, 26)
-
-
20. A non-transitory medium that stores a program of instructions and is readable by a computer for executing the program of instructions to perform a method for calculating synchronization errors between destination video content and destination audio content, wherein the method comprises:
-
receiving reference video signatures representing one or more video features of a reference video signal, reference audio signatures representing one or more audio features of a reference audio signal, and indications of relative temporal alignment of the video and audio features; generating one or more destination video signatures in response to one or more video features extracted from the destination video content; generating one or more destination audio signatures in response to one or more audio features extracted from the destination audio content; comparing a sequence of destination video signatures with a sequence of reference video signatures to find a match between the destination video content and the reference video content used to generate the reference video signatures, obtaining a relative video timing difference between the destination video signal and the reference video signal; comparing a sequence of destination audio signatures with a sequence of reference audio signatures to find a match between the destination audio content and the reference audio content used to generate the reference audio signatures, obtaining a relative audio timing difference between the destination audio signal and the reference audio signal; calculating a synchronization error between the destination video content and the destination audio content from the relative timing difference between the destination video signal and the reference video signal and from the relative timing difference between the destination audio signal and the reference audio signal as compared with the relative temporal alignment of the video and audio features of the reference video signal and the reference audio signal; calculating a video-match confidence measure that represents a degree of certainty in the found match between the destination video content and the reference video content by analyzing past relative video timing differences and selecting a prediction model that best represents the past relative video timing differences, wherein the model may be selected from models that represent a sequence of timing differences that are constant, that increase or decrease at a linear rate, or that include an abrupt change in value, and wherein parameters for the selected model are derived to minimize differences between the selected model output and the past relative video timing differences; calculating an audio-match confidence measure that represents a degree of certainty in the found match between the destination audio content and the reference audio content by analyzing past relative audio timing differences and selecting a prediction model that best represents the past relative audio timing differences, wherein the model may be selected from models that represent a sequence of timing differences that are constant, that increase or decrease at a linear rate, or that include an abrupt change in value, and wherein parameters for the selected model are derived to minimize differences between the selected model output and the past relative audio timing differences; storing the choice of the selected models and their parameters in a buffer; deriving a measure of reliability in the synchronization error from the video-match confidence measure and the audio-match confidence measure; using the buffer of stored models to predict the synchronization error for intervals where the measures of reliability are below a threshold; and displaying the synchronization error, or correcting the synchronization error by delaying one or both of the destination video and the destination audio to bring them into proper temporal alignment. - View Dependent Claims (27)
-
Specification