Methods and systems for representation and matching of video content
First Claim
Patent Images
1. A method of determining temporal correspondence between different sets of video data, the method comprising:
- inputting the sets of video data;
representing the video data as ordered sequences of visual nucleotides; and
determining temporally corresponding subsets of video data by aligning the sequences of visual nucleotides;
wherein the visual nucleotides are computed by;
representing a temporal interval of the video data as a collection of features and feature descriptors;
discarding the spatial coordinates of the features, andgrouping similar feature descriptors into bins according to a grouping function; and
creating visual nucleotides that correspond to the coefficients of the various feature descriptor bins; and
wherein the features are chosen so as to be substantially invariant with respect to video resolution, orientation, or lighting.
4 Assignments
0 Petitions
Accused Products
Abstract
The described methods and systems provide for the representation and matching of video content, including spatio-temporal matching of different video sequences. A particular method of determining temporal correspondence between different sets of video data inputs the sets of video data and represents the video data as ordered sequences of visual nucleotides. Temporally corresponding subsets of video data are determined by aligning the sequences of visual nucleotides.
17 Citations
20 Claims
-
1. A method of determining temporal correspondence between different sets of video data, the method comprising:
-
inputting the sets of video data; representing the video data as ordered sequences of visual nucleotides; and determining temporally corresponding subsets of video data by aligning the sequences of visual nucleotides; wherein the visual nucleotides are computed by; representing a temporal interval of the video data as a collection of features and feature descriptors; discarding the spatial coordinates of the features, and grouping similar feature descriptors into bins according to a grouping function; and creating visual nucleotides that correspond to the coefficients of the various feature descriptor bins; and wherein the features are chosen so as to be substantially invariant with respect to video resolution, orientation, or lighting. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. An apparatus comprising:
-
a source of video data; a video segmenter coupled to the source of video data and configured to segment video data into temporal intervals, wherein the temporal intervals include a plurality of time -consecutive video image frames; and a video processor coupled to the source of video data and configured to compute a visual nucleotide for each temporal interval; wherein the video processor computes the visual nucleotides by; representing a temporal interval of the video data as a collection of features and feature descriptors; discarding spatial coordinates of the features and grouping similar feature descriptors into bins according to a grouping function; and creating visual nucleotides that correspond to the coefficients of the feature descriptor bins; wherein the features are chosen by the video processor to be substantially invariant with respect to video resolution, video orientation, and video lighting. - View Dependent Claims (15, 16)
-
-
17. A method of determining temporal correspondence between different sets of video data, the method comprising:
-
inputting the sets of video data; representing the video data as ordered sequences of visual nucleotides; and determining temporally corresponding subsets of video data by aligning the sequences of visual nucleotides; further comprising computing a spatial correspondence between the temporally corresponding subsets of video data; wherein computing spatial correspondence is performed by; inputting temporally corresponding subsets of video data; providing feature points in subsets of video data; finding correspondence between feature points; and finding correspondence between spatial coordinates; wherein finding correspondence between feature points is performed by finding parameters of a model describing the transformation between two sets of feature points, wherein finding parameters of a model is performed by solving the following optimization problem
-
-
18. A method of determining temporal correspondence between different sets of video data, the method comprising:
-
inputting the sets of video data; representing the video data as ordered sequences of visual nucleotides; and determining temporally corresponding subsets of video data by aligning the sequences of visual nucleotides; wherein the video data is segmented into temporal intervals including a plurality of time -consecutive video image frames, and wherein one visual nucleotide is computed for each interval; wherein the visual nucleotide is computed by; representing a temporal interval of the video data as a collection of features and feature descriptors; discarding the spatial coordinates of the features, and grouping similar feature descriptors into bins according to a grouping function; and creating visual nucleotides that correspond to the coefficients of the various feature descriptor bins; wherein the features are chosen so as to be invariant with respect to video resolution, orientation, or lighting; wherein computing a collection of feature descriptors is performed by; tracking of corresponding invariant feature points in the temporal interval of the video data; computing a single descriptor as a function of the descriptors of the invariant feature points belonging to a track; and assigning the descriptor to all features belonging to the track; and wherein a function of the descriptors of the invariant feature points belonging to a track is the average of the invariant feature points descriptors, or the median of the invariant feature points descriptors.
-
-
19. A method of determining temporal correspondence between different sets of video data, the method comprising:
-
inputting the sets of video data; representing the video data as ordered sequences of visual nucleotides; and determining temporally corresponding subsets of video data by aligning the sequences of visual nucleotides; wherein aligning sequences of visual nucleotides includes;
receiving two sequences of visual nucleotides s={s1, . . . , sM} and q={q1, . . . , qM} as the input;receiving a score function σ
(si,qj) and a gap penalty function γ
(i,j,n) as the parameters;finding the partial correspondence C={(i1,j1), . . . , (iK,jK)} and the collection of gaps G={(l1,m1,n1), . . . , (lL,mL,nL)} maximizing the F(C,G) function; - View Dependent Claims (20)
-
Specification