Tracking semantic objects in vector image sequences
First Claim
1. A method for tracking video objects in video frames, the method comprising:
- performing spatial segmentation on a current video frame to identify plural regions of pixels with homogenous intensity values;
performing motion estimation between each of the plural regions in the current video frame and a previous video frame in which a boundary of a video object was previously computed;
using the motion estimation for each of the plural regions to warp pixel locations in the region to locations in the previous video frame;
determining whether the warped pixel locations are within the previously computed boundary of the video object in the previous video frame to identify a set of the plural regions that are likely to be part of the video object in the current video frame; and
forming a boundary of the video object in the current video frame as a combination of each of the plural regions in the current video frame that are in the set.
2 Assignments
0 Petitions
Accused Products
Abstract
A semantic object tracking method tracks general semantic objects with multiple non-rigid motion, disconnected components and multiple colors throughout a vector image sequence. The method accurately tracks these general semantic objects by spatially segmenting image regions from a current frame and then classifying these regions as to which semantic object they originated from in the previous frame. To classify each region, the method performs a region based motion estimation between each spatially segmented region and the previous frame to compute the position of a predicted region in the previous frame. The method then classifies each region in the current frame as being part of a semantic object based on which semantic object in the previous frame contains the most overlapping points of the predicted region. Using this method, each region in the current image is tracked to one semantic object from the previous frame, with no gaps or overlaps. The method propagates few or no errors because it projects regions into a frame where the semantic object boundaries are previously computed rather than trying to project and adjust a boundary in a frame where the object'"'"'s boundary is unknown.
253 Citations
31 Claims
-
1. A method for tracking video objects in video frames, the method comprising:
-
performing spatial segmentation on a current video frame to identify plural regions of pixels with homogenous intensity values;
performing motion estimation between each of the plural regions in the current video frame and a previous video frame in which a boundary of a video object was previously computed;
using the motion estimation for each of the plural regions to warp pixel locations in the region to locations in the previous video frame;
determining whether the warped pixel locations are within the previously computed boundary of the video object in the previous video frame to identify a set of the plural regions that are likely to be part of the video object in the current video frame; and
forming a boundary of the video object in the current video frame as a combination of each of the plural regions in the current video frame that are in the set. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
repeating claim 1 for a subsequent video frame using the boundary of the video object in the current video frame as a reference boundary for the subsequent video frame.
-
-
3. The method of claim 1 further including:
filtering the current video frame to remove noise from the current video frame before performing the spatial segmentation.
-
4. The method of claim 1 wherein the spatial segmentation includes, for each of the plural regions, ensuring that a difference between a maximum intensity value in the region and a minimum intensity value in the region is below a threshold.
-
5. The method of claim 1 wherein the spatial segmentation is a sequential region growing comprising:
-
starting with a first pixel location in the current video frame, growing a first region of connected pixels around the first pixel location by adding pixel locations to the first region such that a homogeneity criteria is satisfied;
when no boundary pixels around the first region satisfy the homogeneity criteria, repeating the growing for another region with a pixel location outside the first region; and
continuing the growing until each of the pixel locations in the current video frame is identified as being part of one of the plural regions.
-
-
6. The method of claim 1 wherein the motion estimation comprises:
-
for each of the plural regions identified through spatial segmentation in the current video frame, performing a region-based motion estimation including matching only pixels within the region with pixels in the previous video frame; and
applying a motion model to approximate motion of the pixels within the region to corresponding locations in the previous frame.
-
-
7. The method of claim 6 wherein the motion model is used to find a motion vector for each of the plural regions.
-
8. The method of claim 1 wherein the determining includes for each of the plural regions:
-
counting the warped pixel locations that are within the boundary of the video object in the previous video frame;
when a majority of the warped pixel locations are within the boundary of the video object, classifying the region as being in the set that are likely to be part of the video object in the current video frame.
-
-
9. A computer-readable medium having instructions stored thereon for causing a computer system programmed thereby to perform the method of claim 1.
-
10. A computer system for tracking semantic objects in a vector image sequence of image frames, the system comprising:
-
a spatial segmentation module for segmenting a current image frame in the vector image sequence into plural regions, each of the plural regions comprising image points having image values that satisfy a homogeneity criterion;
a motion estimator module for estimating the motion between each of the plural regions in the current image frame and a reference image frame, and for determining a motion parameter for each of the plural regions that approximates the motion of the region between the current image frame and the reference image frame, wherein the reference image frame includes a semantic object for which a semantic object boundary was previously computed; and
a region classifier module for applying the motion parameter of each of the plural regions to the region to compute a predicted region in the reference image frame, and for classifying each of the plural regions as being part of or not being part of the semantic object depending on the extent to which the predicted region for the region falls within the previously computed semantic object boundary in the reference image frame;
wherein a corresponding semantic object boundary in the current image frame is formed from each region classified as being part of the semantic object. - View Dependent Claims (11, 12, 13, 14)
the reference image frame includes two or more semantic objects, each of the two or more semantic objects occupying a non-overlapping area of the reference image frame, the region classifier module classifies each of the plural regions as being part of one of the two or more semantic objects depending on which of the two or more semantic objects in the reference image frame has maximum overlap with the predicted region for the region, and wherein corresponding semantic object boundaries in the current image frame are formed from the plural classified regions.
-
-
14. The computer system of claim 13 further including:
a majority operator for defining a structure of points around each image point in the current image frame.
-
15. A method for tracking semantic objects in vector image sequences, the method comprising:
-
performing spatial segmentation on a current image frame to identify plural regions of discrete image points with homogenous image values;
performing motion estimation between each of the plural regions in the current image frame and a target image frame in which a boundary of a semantic object is known;
using the motion estimation for each of the plural regions to warp the image points in the region to locations in the target image frame;
for each region of the plural regions, when at least a threshold amount of the warped image points for the region overlap the semantic object with the known boundary in the target image frame, classifying the region as originating from the semantic object; and
forming a boundary of the semantic object in the current image frame based upon each of the plural regions in the current image frame that are classified as originating from the semantic object. - View Dependent Claims (16, 17, 18, 19)
for each of one or more additional image frames, designating the current image frame as the target image frame;
designating the additional image frame as the current image frame; and
repeating claim 15.
-
-
17. The method of claim 15 wherein each of the plural regions is determined to be homogenous by only adding neighboring image points to the region where the difference between a maximum image value and a minimum image value in the region after adding each neighboring image point is below a threshold.
-
18. The method of claim 15 wherein:
-
the target image frame is the previous image frame of the current image frame, each of the plural regions segmented from the current image frame is classified as originating from exactly one semantic object, boundaries for semantic objects in the current image frame are computed based upon boundaries of regions classified as originating from the respective semantic objects, and claim 15 is repeated for successive image frames in the vector image sequence.
-
-
19. A computer-readable medium having instructions stored thereon for causing a computer system programmed thereby to perform the method of claim 15.
-
20. A method for tracking semantic objects in vector image sequences, the method comprising:
-
performing spatial segmentation on a current image frame to identify plural regions of discrete image points with homogenous image values, where each of the plural regions comprises image points, and where each of the plural regions is determined to be homogenous by only adding neighboring image points to the region where the difference between a maximum image value and a minimum image value in the region after adding each neighboring image point is below a threshold;
performing region-based motion estimation between each of the plural regions in the current image frame and an immediate previous image frame in the vector image sequence in which a boundary of a semantic object is known;
using the motion estimated for each of the plural regions to warp the image points in the region to locations in the immediate previous image frame;
for each of the plural regions, when at least a threshold amount of the warped image points for the region overlap the semantic object with the known boundary in the immediate previous image frame, classifying the region as originating from the semantic object; and
forming a boundary for the semantic object in the current image frame based upon which of the plural regions in the current image frame are classified as originating from the semantic object.
-
-
21. A method of tracking an object in a vector image sequence using backward region-based classification, the method comprising:
-
computing a boundary of an object in a first image frame of a vector image sequence;
segmenting a second image frame of the vector image sequence into plural regions;
based upon motion estimates, warping each of the plural regions backward into the first image frame;
for each of the plural regions of the second image frame, if a threshold portion of the warped region lies within the previously computed boundary in the first image frame, classifying the region of the second image frame as part of the object. - View Dependent Claims (22, 23, 24, 25, 26)
forming a new boundary of the object in the second image frame from one or more of the plural regions classified as part of the object.
-
-
23. The method of claim 22 further comprising:
repeating the segmenting, warping, and classifying for a third image frame, wherein the warping proceeds from the third image frame backward into the second image frame, and wherein the classifying occurs by comparison to the new boundary.
-
24. The method of claim 21 wherein the segmenting comprises:
-
growing a first region of the plural regions from a first image point by adding neighboring image points that satisfy a homogeneity criterion for the first region, wherein the homogeneity criterion constrains the difference between a maximum image point value and a minimum image point value in the first region; and
repeating the growing for other regions of the plural regions until each image point of the second image frame is part of one of the plural regions.
-
-
25. The method of claim 21 wherein the first image frame includes plural objects having boundaries, and wherein the classifying indicates one of the plural objects for each of the plural regions of the second image frame.
-
26. A computer-readable medium having instructions stored thereon for causing a computer system programmed thereby to perform the method of claim 21.
-
27. A computer-readable medium having instructions stored thereon for causing a computer system programmed thereby to perform a method of tracking an object in a vector image sequence using backward region-based classification, the method comprising:
-
computing a boundary of an object in a previous image frame of a vector image sequence;
segmenting a current image frame of a vector image sequence into plural regions;
based upon motion estimates for the plural regions of the current image frame, warping each of the plural regions backward into the previous image frame;
for each of the plural regions of the current image frame, if a threshold portion of the warped region lies within the previously computed boundary of the object in the previous image frame, classifying the region of the current image frame as part of the object.
-
-
28. A method of tracking an object in a vector image sequence using backward region-based classification, the method comprising:
-
computing a boundary of an object in a first image frame of a vector image sequence;
segmenting a second image frame of the vector image sequence into plural regions;
for each of the plural regions of the second image frame, associating the region with a corresponding region of the first image frame based upon motion estimation from the second image frame back to the first image frame; and
if a threshold portion of the associated corresponding region of the first image frame lies within the previously computed boundary in the first image frame, classifying the region of the second image frame as part of the object. - View Dependent Claims (29, 30, 31)
forming a new boundary of the object in the second image frame from one or more of the plural regions classified as part of the object.
-
-
30. The method of claim 29 further comprising:
repeating the segmenting, associating, and classifying for a third image frame, wherein the associating is based upon motion estimation from third image frame back to the second image frame, and wherein the classifying occurs by comparison to the new boundary.
-
31. A computer-readable medium having instructions stored thereon for causing a computer system programmed thereby to perform the method of claim 28.
Specification