Methods and systems for representation and matching of video content

US 8,417,037 B2
Filed: 01/06/2009
Issued: 04/09/2013
Est. Priority Date: 07/16/2007
Status: Active Grant

First Claim

Patent Images

1. A method of determining temporal correspondence between different sets of video data, the method comprising:

inputting the sets of video data;

representing the video data as ordered sequences of visual nucleotides; and

determining temporally corresponding subsets of video data by aligning the sequences of visual nucleotides;

wherein the visual nucleotides are computed by;

representing a temporal interval of the video data as a collection of features and feature descriptors;

discarding the spatial coordinates of the features, andgrouping similar feature descriptors into bins according to a grouping function; and

creating visual nucleotides that correspond to the coefficients of the various feature descriptor bins; and

wherein the features are chosen so as to be substantially invariant with respect to video resolution, orientation, or lighting.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The described methods and systems provide for the representation and matching of video content, including spatio-temporal matching of different video sequences. A particular method of determining temporal correspondence between different sets of video data inputs the sets of video data and represents the video data as ordered sequences of visual nucleotides. Temporally corresponding subsets of video data are determined by aligning the sequences of visual nucleotides.

17 Citations

View as Search Results

20 Claims

1. A method of determining temporal correspondence between different sets of video data, the method comprising:
- inputting the sets of video data;
  
  representing the video data as ordered sequences of visual nucleotides; and
  
  determining temporally corresponding subsets of video data by aligning the sequences of visual nucleotides;
  
  wherein the visual nucleotides are computed by;
  
  representing a temporal interval of the video data as a collection of features and feature descriptors;
  
  discarding the spatial coordinates of the features, andgrouping similar feature descriptors into bins according to a grouping function; and
  
  creating visual nucleotides that correspond to the coefficients of the various feature descriptor bins; and
  
  wherein the features are chosen so as to be substantially invariant with respect to video resolution, orientation, or lighting.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1, further comprising computing a spatial correspondence between the temporally corresponding subsets of video data.
  - 3. The method of claim 2, wherein computing spatial correspondence is performed by:
    - inputting temporally corresponding subsets of video data;
      
      providing feature points in subsets of video data;
      
      finding correspondence between feature points; and
      
      finding correspondence between spatial coordinates.
  - 4. The method of claim 3, wherein finding correspondence between feature points is performed by finding parameters of a model describing the transformation between two sets of feature points, wherein finding parameters of a model is performed by solving the following optimization problem
  - 5. The method of claim 1, wherein the video data is segmented into temporal intervals including a plurality of time-consecutive video image frames, and wherein one visual nucleotide is computed for each interval.
  - 6. The method of claim 5, wherein computing a collection of feature descriptors is performed by:
    - tracking of corresponding invariant feature points in the temporal interval of the video data;
      
      computing a single descriptor as a function of the descriptors of the invariant feature points belonging to a track; and
      
      assigning the descriptor to all features belonging to the track.
  - 7. The method of claim 6, wherein a function of the descriptors of the invariant feature points belonging to a track is the average of the invariant feature points descriptors, or the median of the invariant feature points descriptors.
  - 8. The method of claim 1, wherein aligning sequences of visual nucleotides includes:
    - receiving two sequences of visual nucleotides s={s₁, . . . , s_M} and q={q₁, . . . , q_m} as the input;
      
      receiving a score function σ
      
      (s_i,q_j) and a gap penalty function γ
      
      (i,j,n) as the parameters;
      
      finding the partial correspondence C={(i₁,j₁), . . . , (i_K,j_K)} and the collection of gaps G={(l₁,m₁,n₁), . . . , (l_L,m_L,n_L)} maximizing the F(C,G) function;
  - 9. The method of claim 8, wherein the score function is inversely proportional to a distance function d(s_i, q_j), and the distance function comprises a combination of distance functions selected from the group consisting of the Euclidean distance, the L1 distance, the Mahalanobis distance, the Kullback-Leibler divergence distance, and the Earth Mover'"'"'s distance.
  - 10. The method of claim 1, wherein the feature descriptors describe the visual content of a local spatio-temporal region of the video data.
  - 11. The method of claim 10, wherein the feature descriptors are SIFT descriptors, spatio-temporal SIFT descriptors, or SURF descriptors.
  - 12. The method of claim 1, wherein the grouping function weighs the contribution of a particular visual feature in the nucleotide by a function that comprises the temporal location of the feature in the temporal interval, the spatial location of the feature in the temporal interval, or the significance of the feature.
  - 13. The method of claim 1, wherein the features in the temporal interval are computed using detectors comprising detectors selected from the group consisting of Harris-Laplace corner detectors, affine-invariant Harris-Laplace corner detectors, Harris-Laplace corner detectors, spatio-temporal corner detectors or a MSER algorithm.

14. An apparatus comprising:
- a source of video data;
  
  a video segmenter coupled to the source of video data and configured to segment video data into temporal intervals, wherein the temporal intervals include a plurality of time -consecutive video image frames; and
  
  a video processor coupled to the source of video data and configured to compute a visual nucleotide for each temporal interval;
  
  wherein the video processor computes the visual nucleotides by;
  
  representing a temporal interval of the video data as a collection of features and feature descriptors;
  
  discarding spatial coordinates of the features and grouping similar feature descriptors into bins according to a grouping function; and
  
  creating visual nucleotides that correspond to the coefficients of the feature descriptor bins;
  
  wherein the features are chosen by the video processor to be substantially invariant with respect to video resolution, video orientation, and video lighting.
- View Dependent Claims (15, 16)
- - 15. The apparatus of claim 14, further comprising a video aggregator coupled to the video segmenter and the video processor, the video aggregator configured to generate a video DNA associated with the video data, wherein the video DNA includes video data ordered as sequences of visual nucleotides.
  - 16. The apparatus of claim 14, wherein the video processor computes a collection of feature descriptors by:
    - tracking corresponding invariant feature points in the temporal interval of the video data;
      
      computing a single descriptor as a function of the descriptors of the invariant feature points belonging to a track; and
      
      assigning the descriptor to all features belonging to the track.

17. A method of determining temporal correspondence between different sets of video data, the method comprising:
- inputting the sets of video data;
  
  representing the video data as ordered sequences of visual nucleotides; and
  
  determining temporally corresponding subsets of video data by aligning the sequences of visual nucleotides;
  
  further comprising computing a spatial correspondence between the temporally corresponding subsets of video data;
  
  wherein computing spatial correspondence is performed by;
  
  inputting temporally corresponding subsets of video data;
  
  providing feature points in subsets of video data;
  
  finding correspondence between feature points; and
  
  finding correspondence between spatial coordinates;
  
  wherein finding correspondence between feature points is performed by finding parameters of a model describing the transformation between two sets of feature points, wherein finding parameters of a model is performed by solving the following optimization problem

18. A method of determining temporal correspondence between different sets of video data, the method comprising:
- inputting the sets of video data;
  
  representing the video data as ordered sequences of visual nucleotides; and
  
  determining temporally corresponding subsets of video data by aligning the sequences of visual nucleotides;
  
  wherein the video data is segmented into temporal intervals including a plurality of time -consecutive video image frames, and wherein one visual nucleotide is computed for each interval;
  
  wherein the visual nucleotide is computed by;
  
  representing a temporal interval of the video data as a collection of features and feature descriptors;
  
  discarding the spatial coordinates of the features, andgrouping similar feature descriptors into bins according to a grouping function; and
  
  creating visual nucleotides that correspond to the coefficients of the various feature descriptor bins;
  
  wherein the features are chosen so as to be invariant with respect to video resolution, orientation, or lighting;
  
  wherein computing a collection of feature descriptors is performed by;
  
  tracking of corresponding invariant feature points in the temporal interval of the video data;
  
  computing a single descriptor as a function of the descriptors of the invariant feature points belonging to a track; and
  
  assigning the descriptor to all features belonging to the track; and
  
  wherein a function of the descriptors of the invariant feature points belonging to a track is the average of the invariant feature points descriptors, or the median of the invariant feature points descriptors.

19. A method of determining temporal correspondence between different sets of video data, the method comprising:
- inputting the sets of video data;
  
  representing the video data as ordered sequences of visual nucleotides; and
  
  determining temporally corresponding subsets of video data by aligning the sequences of visual nucleotides;
  
  wherein aligning sequences of visual nucleotides includes;
  
  receiving two sequences of visual nucleotides s={s₁, . . . , s_M} and q={q₁, . . . , q_M} as the input;
  
  receiving a score function σ
  
  (s_i,q_j) and a gap penalty function γ
  
  (i,j,n) as the parameters;
  
  finding the partial correspondence C={(i₁,j₁), . . . , (i_K,j_K)} and the collection of gaps G={(l₁,m₁,n₁), . . . , (l_L,m_L,n_L)} maximizing the F(C,G) function;
- View Dependent Claims (20)
- - 20. The method of claim 19, wherein the score function is inversely proportional to a distance function d(s_i, q_j), and the distance function comprises a combination of distance functions selected from the group consisting of the Euclidean distance, the L1 distance, the Mahalanobis distance, the Kullback-Leibler divergence distance, and the Earth Mover'"'"'s distance.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Vdoqwest, Inc.
Original Assignee
Novafora, Inc.
Inventors
Bronstein, Alexander, Bronstein, Michael, Rakib, Shlomo Selim
Primary Examiner(s)
Kumar, Pankaj
Assistant Examiner(s)
NEWLIN, TIMOTHY R

Application Number

US12/349,478
Publication Number

US 20090175538A1
Time in Patent Office

1,554 Days
Field of Search

382/190, 382/192, 382/195, 382/201
US Class Current

382/190
CPC Class Codes

G06F 16/7847   using low-level visual feat...

G06V 20/48   Matching video sequences

H04N 21/44008   involving operations for an...

H04N 21/4828   for searching program descr...

Methods and systems for representation and matching of video content

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

17 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Methods and systems for representation and matching of video content

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

17 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others