Methods and systems for representation and matching of video content
First Claim
Patent Images
1. A method of determining spatio-temporal correspondence between different sets of video data, the method comprising:
- inputting the sets of video data;
representing the video data as ordered sequences of visual nucleotides;
determining temporally corresponding subsets of video data by aligning the sequences of visual nucleotides;
computing a spatial correspondence between the temporally corresponding subsets of video data (spatio-temporal correspondence); and
outputting the spatio-temporal correspondence between subsets of the video data.
4 Assignments
0 Petitions
Accused Products
Abstract
The described methods and systems provide for the representation and matching of video content, including spatio-temporal matching of different video sequences. A particular method of determining temporal correspondence between different sets of video data inputs the sets of video data and represents the video data as ordered sequences of visual nucleotides. Temporally corresponding subsets of video data are determined by aligning the sequences of visual nucleotides.
186 Citations
61 Claims
-
1. A method of determining spatio-temporal correspondence between different sets of video data, the method comprising:
-
inputting the sets of video data; representing the video data as ordered sequences of visual nucleotides; determining temporally corresponding subsets of video data by aligning the sequences of visual nucleotides; computing a spatial correspondence between the temporally corresponding subsets of video data (spatio-temporal correspondence); and outputting the spatio-temporal correspondence between subsets of the video data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51)
-
-
52. A method of determining spatio-temporal correspondence between different sets of video data, the method comprising:
-
inputting the sets of video data; representing the video data as ordered sequences of visual nucleotides, wherein the video data are segmented into temporal intervals; computing at least one visual nucleotide for each temporal interval, wherein each of the visual nucleotides is a grouping function collection of a plurality of visual atoms from a different temporal interval of the video data, and wherein each of the visual atoms describe the visual content of a local spatio-temporal region of the video data; constructing the visual atoms by; detecting a collection of invariant feature points in the temporal interval; computing a collection of descriptors of the local spatio-temporal region of the video data around each invariant feature point; removing a subset of invariant feature points and their descriptors; constructing a collection of visual atoms as a function of the remaining invariant feature point locations and descriptors; determining temporally corresponding subsets of video data by aligning sequences of visual nucleotides; computing spatial correspondence between temporally corresponding subsets of video data (spatio-temporal correspondence); and outputting the spatio-temporal correspondence between subsets of the video data. - View Dependent Claims (53, 54, 55, 56, 57, 58, 59)
-
-
60. A method of determining spatio-temporal correspondence between different sets of video data, the method comprising:
-
creating a sequence of visual nucleotides by the steps of; analyzing a series of time successive video images from the video data for features; pruning the features to remove features that are only present on one video image; time averaging the remaining video features and discarding outlier features from the average; using a nearest neighbor fit to assign the remaining features to a standardized array of different features; counting the number of each type of assigned feature in the series of time successive video images, thus creating coefficients for the standardized array of different features, where each visual nucleotide consists of this array of coefficients, and the sequence of visual nucleotides consists of sequential time successive visual nucleotides; determining temporally corresponding subsets of video data by aligning sequences of the visual nucleotides; computing spatial correspondence between temporally corresponding subsets of video data (spatio-temporal correspondence); and outputting the spatio-temporal correspondence between subsets of the video data.
-
-
61. An apparatus comprising:
-
a source of video data; a video segmenter coupled to the source of video data and configured to segment video data into temporal intervals; a video processor coupled to the source of video data and configured to detect feature locations within the video data, generate feature descriptors associated with the feature locations, and prune the detected feature locations to generate a subset of feature locations; and a video aggregator coupled to the video segmenter and the video processor, the video aggregator configured to generate a video DNA associated with the video data, wherein the video DNA includes video data ordered as sequences of visual nucleotides.
-
Specification