Digital video content fingerprinting based on scale invariant interest region detection with an array of anisotropic filters

US 8,189,945 B2
Filed: 11/05/2009
Issued: 05/29/2012
Est. Priority Date: 05/27/2009
Status: Active Grant

First Claim

Patent Images

1. A method for content based video sequence identification comprising:

applying a bi-level filter to images in a first pass analysis to detect a set of initial interest points in a plurality of selected video frames, wherein the first pass analysis reduces the effective area of the images in each selected video frame to multiple smaller images; and

applying an array of anisotropic filters to regions of pixels around each initial interest point of the set of initial interest points in a second pass analysis to refine a spatial position for each initial interest point and determine a first scale parameter in the x direction (s_x) and a second scale parameter in the y direction (s_y), wherein the s_xand the s_yscale parameters are separately varied to provide accurate region characterizations that are resistant to image distortion for identification of the plurality of selected video frames in a video sequence.

View all claims

14 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Video sequence processing is described with various filtering rules applied to extract dominant features for content based video sequence identification. Active regions are determined in video frames of a video sequence. Video frames are selected in response to temporal statistical characteristics of the determined active regions. A two pass analysis is used to detect a set of initial interest points and interest regions in the selected video frames to reduce the effective area of images that are refined by complex filters that provide accurate region characterizations resistant to image distortion for identification of the video frames in the video sequence. Extracted features and descriptors are robust with respect to image scaling, aspect ratio change, rotation, camera viewpoint change, illumination and contrast change, video compression/decompression artifacts and noise. Compact, representative signatures are generated for video sequences to provide effective query video matching and retrieval in a large video database.

Citations

27 Claims

1. A method for content based video sequence identification comprising:
- applying a bi-level filter to images in a first pass analysis to detect a set of initial interest points in a plurality of selected video frames, wherein the first pass analysis reduces the effective area of the images in each selected video frame to multiple smaller images; and
  
  applying an array of anisotropic filters to regions of pixels around each initial interest point of the set of initial interest points in a second pass analysis to refine a spatial position for each initial interest point and determine a first scale parameter in the x direction (s_x) and a second scale parameter in the y direction (s_y), wherein the s_xand the s_yscale parameters are separately varied to provide accurate region characterizations that are resistant to image distortion for identification of the plurality of selected video frames in a video sequence.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 2. The method of claim 1, wherein the array of anisotropic filters is an array of sampled elliptic-shaped anisotropic filters.
  - 3. The method of claim 1 further comprises:
    - applying bi-level symmetric filters with multiple different spatial support on a selected video frame to produce multiple filter response images with the same size selected video frame;
      
      constructing a set of contiguous rectangular spatial areas on the multiple filter response images derived for the selected video frame;
      
      constructing a corresponding set of 3-dimensional scale-space pixel volumes for each of the contiguous rectangular spatial areas;
      
      determining a set of local filter response maxima at spatial-scale 3-dimensional pixel volumes; and
      
      sorting local filter response maxima at spatial-scale 3-dimensional pixel volumes, and selecting a set of local filter response maxima with their spatial (x, y) coordinates to represents the initial interest points of the first pass analysis of the selected video frame.
  - 4. The method of claim 1 further comprises:
    - generating Laplacian of Gaussian second order partial derivative bi-level filters.
  - 5. The method of claim 1, wherein Laplacian of Gaussian second order partial derivative bi-level filters are applied as bi-level star-shaped filters that approximate circular filters.
  - 6. The method of claim 1 further comprises:
    - convolving Laplacian of Gaussian second order partial derivative bi-level filters of various sizes with one of the selected video frames to form bi-level filter response images for a specified set of scalar scale values.
  - 7. The method of claim 6, wherein a local maximum value is determined for each 3-dimensional image volume in a set of contiguous 3-dimensional image volumes associated with pixels of bi-level octagonal-shaped or star-shaped filter response images.
  - 8. The method of claim 7, wherein local maximum values are sorted according to their magnitude, and a subset of maxima are selected to represent the set of initial interest points.
  - 9. The method of claim 1 further comprises:
    - computing a set of Hessian determinant response images for each interest region formed around an initial interest point to determine a (s_x, s_y) scale pair for each initial interest point, wherein each initial interest point is detected in the first pass analysis.
  - 10. The methods of claim 1 further comprises:
    - convolving each finite spatial support anisotropic filter from the array of anisotropic filters of finite spatial support, with rectangular regions around each of the initial interest points in the set of initial interest points determined in the first pass analysis of the plurality of selected video frames to determine (s_x, s_y) scale values for the s_xand the s_yscale parameters for each initial interest point.
  - 11. The methods of claim 10, wherein the array of anisotropic filters comprise elliptic-shaped Gaussian second order partial derivative filters with finite rectangular spatial support directly proportional to the (s_x, s_y) scale values.
  - 12. The method of claim 9 further comprises:
    - generating a set of second order partial derivative images L_xx, L_yy, L_xy, L_yxcomputed along x and y coordinates and for each of the anisotropic filters of finite spatial support from the array of anisotropic filters, to determine the set of Hessian determinant response images.
  - 13. The method of claim 9 further comprises:
    - convolving an image I(p, q), representing a region around an initial interest point of the set of initial interest points determined in the first pass, with anisotropic Gaussian second order partial derivative filters of finite rectangular spatial support from the array of anisotropic filters to determine refined interest points at maxima of the Hessian determinant response images with refined spatial coordinates (x,y) and scales (s_x, s_y).
  - 14. The method of claim 1 further comprises:
    - forming a Hessian matrix and a Hessian determinant response image with pixels representing a determinant of the Hessian matrix for each anisotropic filter from the array of anisotropic filters with spatial support corresponding to (s_x, s_y) scale values.
  - 15. The method of claim 14 further comprises:
    - determining non-interpolated refined interest points based on non-interpolated local maxima computed for each combined spatial-scale 4-dimensional pixel volume constructed at equidistant spatial locations in the Hessian determinant response images.
  - 16. The method of claim 15, wherein the non-interpolated local maxima are sorted and a subset of the non-interpolated local maxima that exceed a specified magnitude threshold are selected to represent the non-interpolated refined interest points.
  - 17. The method of claim 16 further comprises:
    - interpolating the subset of non-interpolated local maxima in 2-dimensional scales and image space domains to generate interpolated local maxima values.
  - 18. The method of claim 17 further comprises:
    - generating interest point parameter vectors with (s_x, s_y, x, y, peak polarity) components based on the interpolated local maxima values; and
      
      generating a descriptor in a region centered at the (x,y) position that is a refined interest point spatial position and with a rectangular spatial extent proportional to the (s_x, s_y) scale values.
  - 19. The method of claim 1 further comprising:
    - computing a set of Hessian determinant response images for each interest region formed around an initial interest point to refine an (x,y) position of the initial interest point, wherein each initial interest point is detected in the first pass analysis.

20. A method for content based video sequence identification, the method comprising:
- applying a bi-level filter in a first pass analysis to detect a set of initial interest points in selected video frames, wherein the first pass analysis reduces the effective area of images in each selected video frame to multiple smaller images;
  
  applying an array of anisotropic filters to regions of pixels around the set of initial interest points in a second pass analysis to form a 4-dimensional (4D) space of determinant images with coordinate (x, y, s_x, s_y) values; and
  
  interpolating the determinant images to identify refined interest points with coordinate (x, y, s_x, s_y) values that provide accurate region characterizations that are resistant to image distortion for identification of the video frames in the video sequence.
- View Dependent Claims (21, 22, 23, 24)
- - 21. The method of claim 20 further comprising:
    - identifying a refined interest point by a first scale parameter in the x direction (s_x) and a second scale parameter in the y direction (s_y), wherein the s_xand s_yscale parameters define a spatial extent in the x direction and a spatial extent in the y direction of an elliptic-shaped image for each applied anisotropic filter.
  - 22. The method of claim 21 further comprising:
    - generating an interest point descriptor for a rectangular region around the identified refined interest point with rectangular vertices that are proportional to the s_xand s_yvalues of the identified refined interest point having the coordinate (x, y, s_x, s_y) values.
  - 23. The method of claim 21 further comprising:
    - generating a k by k grid in the Ns_xby Ms_yregion centered around the identified refined interest point and a j by j re-sampled sub-region containing j²interpolated pixels for each cell of the k by k grid, wherein N and M are multiplication factors which determine a neighborhood size around the refined interest point;
      
      generating a horizontal gradient Gx and a vertical gradient Gy based on a partial derivative of each pixel in the j by j re-sampled sub-region; and
      
      generating a plurality of computed gradient values for each sub-region to be concatenated providing a descriptor for the identified refined interest point.
  - 24. The method of claim 23, wherein the plurality of computed gradient values comprises:
    - generating for each re-sampled sub-region a gradient magnitude that is a sum of pixel gradient magnitudes for the pixels in each of the sub-regions;
      
      generating a resultant gradient in the x direction that is a sum of the horizontal gradients Gx for the pixels in each of the sub-regions;
      
      generating a resultant gradient in the y direction that is a sum of the vertical gradients Gy for the pixels in each of the sub-regions; and
      
      generating a resultant sum of gradients in both the x direction and the y direction.

25. A computer readable non-transitory medium having embodied thereon a program for content based video sequence identification, the program being executable by a computer to perform the steps of:
- applying a bi-level filter in a first pass analysis to detect a set of initial interest points in selected video frames, wherein the first pass analysis reduces the effective area of images in each selected video frame to multiple smaller images;
  
  applying an array of anisotropic filters to regions of pixels around the set of initial interest points in a second pass analysis to form a 4-dimensional (4D) space of determinant images with coordinate (x, y, s_x, s_y) values; and
  
  interpolating the determinant images to identify refined interest points with coordinate (x, y, s_x, s_y) values that provide accurate region characterizations that are resistant to image distortion for identification of the video frames in the video sequence.
- View Dependent Claims (26, 27)
- - 26. The computer readable non-transitory medium of claim 25 further comprising:
    - identifying a refined interest point by a first scale parameter in the x direction (s_x) and a second scale parameter in the y direction (s_y), wherein the s_xand s_yscale parameters define a spatial extent in the x direction and a spatial extent in the y direction of an elliptic-shaped image for each applied anisotropic filter.
  - 27. The computer readable non-transitory medium of claim 25 further comprising:
    - generating the multidimensional descriptor and the multi-dimensional signature by combining k by k sets of four computed values comprising a resultant gradient vector magnitude, a resultant gradient vector in spatial x direction, a resultant gradient vector in spatial y direction, and a resultant sum of gradients in both x and y directions.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Roku, Inc.
Original Assignee
Zeitera LLC
Inventors
Stojancic, Mihailo, Ramanathan, Prashant, Wendt, Peter, Pereira, Jose Pio
Primary Examiner(s)
Koziol, Stephen R

Application Number

US12/612,729
Publication Number

US 20100303338A1
Time in Patent Office

936 Days
Field of Search

382/103, 382/154, 382/176, 382/178, 382/199, 382/261, 382/264
US Class Current

382/264
CPC Class Codes

G06F 16/7847   using low-level visual feat...

G06F 16/7854   using shape G06F16/7837 tak...

G06F 16/7864   using domain-transform feat...

G06F 2218/02   Preprocessing

G06F 2218/10   by analysing the shape of a...

G06V 10/464   using a plurality of salien...

G06V 20/46   Extracting features or char...

H04N 5/14   Picture signal circuitry fo...

Digital video content fingerprinting based on scale invariant interest region detection with an array of anisotropic filters

First Claim

14 Assignments

0 Petitions

Accused Products

Abstract

Citations

27 Claims

Specification

Solutions

Use Cases

Quick Links

Digital video content fingerprinting based on scale invariant interest region detection with an array of anisotropic filters

First Claim

14 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

27 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links