Method and device for bounding an object in a video
First Claim
1. A method, comprising:
- obtaining a position of a first subset of pixels in an object in at least one frame of a video sequence according to selection data received from a user interface;
obtaining a subset of pixels per frame of the video sequence, resulting in a plurality of subsets of pixels, by interpolating the position of the first subset of pixels to the video sequence;
obtaining a first image from a first spatio-temporal slicing, wherein said first image is a horizontal concatenation of first slices comprising the subset of pixels for frames along said video sequence;
obtaining a second image from a second spatio-temporal slicing, wherein said second image is a vertical concatenation of second slices comprising the subset of pixels for said frames along said video sequence, each of said second slices being orthogonal to the first slice of a same frame;
obtaining on each of said first and second images a first and a second boundary around the plurality of subsets of pixels per frame by means of a contour detection method;
wherein the coordinates of said four points in a frame t are obtained from the coordinates of the points located in the first and second boundary of the first and second image for that frame t.
1 Assignment
0 Petitions
Accused Products
Abstract
The invention relates to a method for bounding an object in a video sequence Fx,y,t. The method includes obtaining a subset of pixels located in the object to annotate, in each frame of the video sequence. Spatio-temporal slicing is performed on the video sequence Fx,y,t, centered on the obtained subsets of pixels, resulting in a first image Fy,t obtained by an horizontal concatenation of first slices, comprising the obtained subsets of pixels, and resulting in a second image Fx,t obtained by a vertical concatenation of second slices. A trajectory of the obtained subsets of pixels is displayed on both the first Fy,t and second Fx,t image. A bounding form around the object to annotate is obtained out of four points in each frame of the video sequence, wherein the coordinates of the four points of a frame t are obtained from the coordinates of the points located in the first and second boundary of the first and second image for that frame t.
-
Citations
14 Claims
-
1. A method, comprising:
-
obtaining a position of a first subset of pixels in an object in at least one frame of a video sequence according to selection data received from a user interface; obtaining a subset of pixels per frame of the video sequence, resulting in a plurality of subsets of pixels, by interpolating the position of the first subset of pixels to the video sequence; obtaining a first image from a first spatio-temporal slicing, wherein said first image is a horizontal concatenation of first slices comprising the subset of pixels for frames along said video sequence; obtaining a second image from a second spatio-temporal slicing, wherein said second image is a vertical concatenation of second slices comprising the subset of pixels for said frames along said video sequence, each of said second slices being orthogonal to the first slice of a same frame; obtaining on each of said first and second images a first and a second boundary around the plurality of subsets of pixels per frame by means of a contour detection method; wherein the coordinates of said four points in a frame t are obtained from the coordinates of the points located in the first and second boundary of the first and second image for that frame t. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A device configured to:
-
obtain a position of a first subset of pixels in an object in at least one frame of a video sequence according to selection data received from a user interface; obtaining a subset of pixels per frame of the video sequence, resulting in a plurality of subsets of pixels, by interpolating the position of the first subset of pixels to the video sequence; obtain a first image from a first spatio-temporal slicing, wherein said first image is a horizontal concatenation of first slices comprising the subset of pixels for frames along said video sequence; obtain a second image from a second spatio-temporal slicing, wherein said second image is a vertical concatenation of second slices comprising the subset of pixels for said frames along said video sequence, each of said second slices being orthogonal to the first slice of a same frame; obtain on each of said first and second images a first and second boundary around the plurality of subsets of pixels by means of a contour detection method; obtain a bounding form out of four points, around said object in each frame of the video sequence, wherein the coordinates of said four points in a frame t are obtained from the coordinates of the points located in the first and second boundary of the first and second image for that frame t. - View Dependent Claims (12, 13)
-
-
14. A non-transitory computer program product stored on a non-transitory computer readable medium, and comprising program code instructions executable by a processor for:
-
obtaining a position of a first subset of pixels in an object in at least one frame of a video sequence according to selection data received from a user interface; obtaining a subset of pixels per frame of the video sequence, resulting in a plurality of subsets of pixels, by interpolating the position of the first subset of pixels to the video sequence; obtaining a first image from a first spatio-temporal slicing, wherein said first image is a horizontal concatenation of first slices comprising the subset of pixels for frames along said video sequence; obtaining a second image from a second spatio-temporal slicing, wherein said second image is a vertical concatenation of second slices comprising the subset of pixels for said frames along said video sequence, each of said second slices being orthogonal to the first slice of a same frame; obtaining on each of said first and second images a first and second boundary around the plurality of subsets of pixels by means of a contour detection method; obtaining a bounding form out of four points, around said object in each frame of the video sequence, wherein the coordinates of said four points in a frame t are obtained from the coordinates of the points located in the first and second boundary of the first and second image for that frame t.
-
Specification