CONTENT-BASED MATCHING OF VIDEOS USING LOCAL SPATIO-TEMPORAL FINGERPRINTS

US 20100049711A1
Filed: 10/31/2008
Published: 02/25/2010
Est. Priority Date: 08/20/2008
Status: Active Grant

First Claim

Patent Images

1. A computer implemented method for deriving a fingerprint from video data, comprising the steps of:

receiving a plurality of frames from the video data;

selecting at least one key frame from the plurality of frames, the at least one key frame being selected from two consecutive frames of the plurality of frames that exhibiting a maximal cumulative difference in at least one spatial feature of the two consecutive frames;

detecting at least one 3D spatio-temporal feature within the at least one key frame; and

encoding a spatio-temporal fingerprint based on mean luminance of the at least one 3D spatio-temporal feature.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer implemented method computer implemented method for deriving a fingerprint from video data is disclosed, comprising the steps of receiving a plurality of frames from the video data; selecting at least one key frame from the plurality of frames, the at least one key frame being selected from two consecutive frames of the plurality of frames that exhibiting a maximal cumulative difference in at least one spatial feature of the two consecutive frames; detecting at least one 3D spatio-temporal feature within the at least one key frame; and encoding a spatio-temporal fingerprint based on mean luminance of the at least one 3D spatio-temporal feature. The least one spatial feature can be intensity. The at least one 3D spatio-temporal feature can be at least one Maximally Stable Volume (MSV). Also disclosed is a method for matching video data to a database containing a plurality of video fingerprints of the type described above, comprising the steps of calculating at least one fingerprint representing at least one query frame from the video data; indexing into the database using the at least one calculated fingerprint to find a set of candidate fingerprints; applying a score to each of the candidate fingerprints; selecting a subset of candidate fingerprints as proposed frames by rank ordering the candidate fingerprints; and attempting to match at least one fingerprint of at least one proposed frame based on a comparison of gradient-based descriptors associated with the at least one query frame and the at least one proposed frame.

112 Citations

View as Search Results

34 Claims

1. A computer implemented method for deriving a fingerprint from video data, comprising the steps of:
- receiving a plurality of frames from the video data;
  
  selecting at least one key frame from the plurality of frames, the at least one key frame being selected from two consecutive frames of the plurality of frames that exhibiting a maximal cumulative difference in at least one spatial feature of the two consecutive frames;
  
  detecting at least one 3D spatio-temporal feature within the at least one key frame; and
  
  encoding a spatio-temporal fingerprint based on mean luminance of the at least one 3D spatio-temporal feature.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1, wherein the at least one spatial feature is intensity.
  - 3. The method of claim 1, wherein the at least one 3D spatio-temporal feature is at least one Maximally Stable Volume (MSV).
  - 4. The method of claim 3, wherein the at least one MSV is based on two dimensions of length and width of at least one key frame and the third dimension is resolution.
  - 5. The method of claim 3, wherein the at least one MSV is based on two dimensions of length and width of at least one key frame and the third dimension is time.
  - 6. The method of claim 3, wherein the MSV is a volume that exhibits about a zero change in intensity for an incremental change in volume.
  - 7. The method of claim 1, further comprising the step of preprocessing the plurality of frames, wherein the preprocessing step further comprises the steps of:
    - changing the source frame rate to a predefined resampling rate;
      
      converting each of the plurality of frames to a grey scale; and
      
      resizing the plurality of frames to a fixed width and height.
  - 8. The method of claim 3, wherein the encoding step further comprises projecting an ellipse representing the at least one MSV onto a circle whose center is the origin of a local reference axis of the MSV.
  - 9. The method of claim 8, wherein the encoding step further comprises the steps of:
    - enclosing the projected ellipse by a rectangular region;
      
      dividing the rectangular region into a plurality of rectangular blocks; and
      
      calculating the fingerprint based on spatial and temporal differences among the blocks arranged in columns of blocks using spatial and temporal filters, respectively.
  - 10. The method of claim 1, further comprising the step of storing the spatio-temporal fingerprint in a lookup table (LUT).
  - 11. The method of claim 10, wherein the LUT associates with the spatio-temporal fingerprint:
    - at least one pointer to at least one video clip with at least one region having the same fingerprint value;
      
      geometric and shape information associated with an ellipse representing the MSV projected onto the at least one key frame;
      
      the coordinate of the center and three reference points of the key frame; and
      
      a descriptor based on the center and the three reference points of the key frame.
  - 12. The method of claim 11, wherein the three points are located at the corners of a prefixed square enclosing the center of the at least one key frame transformed into the frame of reference of the MSV.
  - 13. The method of claim 12, wherein the descriptor is a gradient-based descriptor.

14. A method for matching a video data to a database containing a plurality of video fingerprints, comprising the steps of:
- calculating at least one fingerprint representing at least one query frame from the video data;
  
  indexing into the database using the at least one calculated fingerprint to find a set of candidate fingerprints;
  
  applying a score to each of the candidate fingerprints;
  
  selecting a subset of candidate fingerprints as proposed frames by rank ordering the candidate fingerprints; and
  
  attempting to match at least one fingerprint of at least one proposed frame based on a comparison of gradient-based descriptors associated with the at least one query frame and the at least one proposed frame.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
- - 15. The method of claim 14, further comprising the step of merging the candidate fingerprints into a plurality of bins.
  - 16. The method of claim 14, wherein the step of merging the candidate fingerprints into a plurality of bins further comprises the step of placing candidate fingerprints into divisions of volumes of a 3D space constructed from the length and width of an area covered by the proposed frames, the third dimension of the 3D space being the frame number in a sequence of the proposed frames.
  - 17. The method of claim 14, wherein the score is inversely proportional to the number of frames in the database having a matching fingerprint and directly proportional to the area of a frame represented by the fingerprint.
  - 18. The method of claim 15, wherein the at least one fingerprint representing at least one query frame and the plurality of video fingerprints are based on at least one Maximally Stable Volume (MSV) determined from at least one of the at least one query frame and the proposed frames and the mean luminance of the at least one MSV.
  - 19. The method of claim 18, wherein the candidate fingerprints are represented in the database as:
    - geometric and shape information associated with an ellipse representing the MSV projected onto a frame in the frame of reference of the MSV;
      
      the coordinates of the center of a proposed frame from which a database frame originates and three points at the corners of a prefixed square enclosing the center (prefix square corner points), the center and the three points having been transformed into the frame of reference of the MSV; and
      
      a gradient-based descriptor based on the prefixed square.
  - 20. The method of claim 19, wherein said step of selecting a subset of candidate fingerprints further comprises the steps of:
    - inverse transforming the transformed three points to frame of reference of the proposed frame for each of the matching candidate fingerprints;
      
      computing the average inverse transformation of the bins that have the highest N accumulated scores; and
      
      rotating and translating a predetermined number of query frames (siftnum) to produce a series of frames that are aligned to the top ranked proposed frames that polled to the bins that have the highest N accumulated scores.
  - 21. The method of claim 20, wherein the step of attempting to match at least one fingerprint further comprises the steps of:
    - calculating the Bhattacharyya distance between gradient-based descriptors of the aligned query frames and the top ranked proposed frames for all proposed frames, anddeclaring a match to a proposed frame p if the Bhattacharyya distance is less than an empirically chosen predetermined threshold T,otherwise, declaring that no match is found.
  - 22. The method of claim 14, further comprising the steps of retrieving the video associated with a matched proposed frame from one of the database containing a plurality of video fingerprints and a remote database.
  - 23. The method of claim 22, wherein the remote database is distributed over the Internet.
  - 24. The method of claim 14, wherein each of the gradient-based descriptors is based on a scale invariant feature transformation (SIFT).

25. An apparatus for deriving a fingerprint from video data, comprising:
- a processor configured for;
  
  receiving a plurality of frames from the video data;
  
  selecting at least one key frame from the plurality of frames, the at least one key frame being selected from two consecutive frames of the plurality of frames that exhibiting a maximal cumulative difference in at least one spatial feature of the two consecutive frames;
  
  detecting at least one 3D spatio-temporal feature within the at least one key frame; and
  
  encoding a spatio-temporal fingerprint based on mean luminance of the at least one 3D spatio-temporal feature.
- View Dependent Claims (26, 27, 28, 29)
- - 26. The apparatus of claim 25, further comprising a video capturing device for generating the plurality of frames.
  - 27. The apparatus of claim 26, further comprising a database for storing at least the spatio-temporal fingerprint in a lookup table (LUT).
  - 28. The apparatus of claim 27, wherein the database is further configured for storing, in association with the at least the spatio-temporal fingerprint:
    - at least one pointer to at least one video clip with at least one region having the same fingerprint value;
      
      geometric and shape information associated with an ellipse representing the MSV projected onto the at least one key frame;
      
      the coordinate of the center and three reference points of the key frame; and
      
      a descriptor based on the center and the three reference points of the key frame.
  - 29. The apparatus of claim 27, further comprising a Web crawler for locating at least one video located on the Internet having a fingerprint which matches the at least the spatio-temporal fingerprint.

30. A computer-readable medium carrying one or more sequences for deriving a fingerprint from video data, wherein execution of the one of more sequences of instructions by one or more processors causes the one or more processors to perform the steps of:
- receiving a plurality of frames from the video data;
  
  selecting at least one key frame from the plurality of frames, the at least one key frame being selected from two consecutive frames of the plurality of frames that exhibiting a maximal cumulative difference in at least one spatial feature of the two consecutive frames;
  
  detecting at least one 3D spatio-temporal feature within the at least one key frame; and
  
  encoding a spatio-temporal fingerprint based on mean luminance of the at least one 3D spatio-temporal feature.

31. An apparatus for matching a video data to a database containing a plurality of video fingerprints, comprising:
- a database containing video fingerprints; and
  
  a processor configured for;
  
  calculating at least one fingerprint representing at least one query frame from the video data;
  
  indexing into the database using the at least one calculated fingerprint to find a set of candidate fingerprints;
  
  applying a score to each of the candidate fingerprints;
  
  selecting a subset of candidate fingerprints as proposed frames by rank ordering the candidate fingerprints; and
  
  attempting to match at least one fingerprint of at least one proposed frame based on a comparison of gradient-based descriptors associated with the at least one query frame and the at least one proposed frame.
- View Dependent Claims (32)
- - 32. The apparatus of claim 31, wherein the at least one fingerprint representing at least one query frame and the plurality of video fingerprints are based on at least one Maximally Stable Volume (MSV) determined from at least one of the at least one query frame and the proposed frames and the mean luminance of the at least one MSV.

33. A computer-readable medium carrying one or more sequences for matching a video data to a database containing a plurality of video fingerprints, wherein execution of the one of more sequences of instructions by one or more processors causes the one or more processors to perform the steps of:
- calculating at least one fingerprint representing at least one query frame from the video data;
  
  indexing into the database using the at least one calculated fingerprint to find a set of candidate fingerprints;
  
  applying a score to each of the candidate fingerprints;
  
  selecting a subset of candidate fingerprints as proposed frames by rank ordering the candidate fingerprints; and
  
  attempting to match at least one fingerprint of at least one proposed frame based on a comparison of gradient-based descriptors associated with the at least one query frame and the at least one proposed frame.
- View Dependent Claims (34)
- - 34. The computer readable medium of claim 33, wherein the at least one fingerprint representing at least one query frame and the plurality of video fingerprints are based on at least one Maximally Stable Volume (MSV) determined from at least one of the at least one query frame and the proposed frames and the mean luminance of the at least one MSV.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
SRI International, Inc.
Original Assignee
SRI International, Inc.
Inventors
Sawhney, Harpreet Singh, Lubin, Jeffrey, Singh, Gajinder, Puri, Manika

Granted Patent

US 8,498,487 B2
Time in Patent Office

Days
Field of Search
US Class Current

N/A
CPC Class Codes

G06F 16/70   of video data

G06F 18/22   Matching criteria, e.g. pro...

G06V 20/46   Extracting features or char...

G06V 20/48   Matching video sequences

G11B 27/28   by using information signal...

CONTENT-BASED MATCHING OF VIDEOS USING LOCAL SPATIO-TEMPORAL FINGERPRINTS

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

112 Citations

34 Claims

Specification

Solutions

Use Cases

Quick Links

CONTENT-BASED MATCHING OF VIDEOS USING LOCAL SPATIO-TEMPORAL FINGERPRINTS

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

112 Citations

34 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links