Tracking semantic objects in vector image sequences

US 20040189863A1
Filed: 01/28/2004
Published: 09/30/2004
Est. Priority Date: 09/10/1998
Status: Active Grant

First Claim

Patent Images

1. A method for tracking video objects in video frames, the method comprising:

performing spatial segmentation on a video frame to identify regions of pixels with homogenous intensity values;

performing motion estimation between each of the regions in the video frame and a previous video frame;

using the motion estimation for each region to warp pixel locations in each region to locations in the previous frame;

determining whether the warped pixel locations are within a boundary of a segmented video object in the previous frame to identify a set of the regions that are likely to be part of the video object; and

forming a boundary of the video object in the video frame as a combination of each of the regions in the video frame that are in the set.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A semantic object tracking method tracks general semantic objects with multiple non-rigid motion, disconnected components and multiple colors throughout a vector image sequence. The method accurately tracks these general semantic objects by spatially segmenting image regions from a current frame and then classifying these regions as to which semantic object they originated from in the previous frame. To classify each region, the method perform a region based motion estimation between each spatially segmented region and the previous frame to computed the position of a predicted region in the previous frame. The method then classifies each region in the current frame as being part of a semantic object based on which semantic object in the previous frame contains the most overlapping points of the predicted region. Using this method, each region in the current image is tracked to one semantic object from the previous frame, with no gaps or overlaps. The method propagates few or no errors because it projects regions into a frame where the semantic object boundaries are previously computed rather than trying to project and adjust a boundary in a frame where the object'"'"'s boundary is unknown.

111 Citations

View as Search Results

20 Claims

1. A method for tracking video objects in video frames, the method comprising:
- performing spatial segmentation on a video frame to identify regions of pixels with homogenous intensity values;
  
  performing motion estimation between each of the regions in the video frame and a previous video frame;
  
  using the motion estimation for each region to warp pixel locations in each region to locations in the previous frame;
  
  determining whether the warped pixel locations are within a boundary of a segmented video object in the previous frame to identify a set of the regions that are likely to be part of the video object; and
  
  forming a boundary of the video object in the video frame as a combination of each of the regions in the video frame that are in the set.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1 further including:
    - repeating the steps of claim 1 for subsequent frames using the boundary of the video object as a reference boundary for the next frame.
  - 3. The method of claim 1 further including:
    - filtering the video frame to remove noise from the video frame before performing the spatial segmentation.
  - 4. The method of claim 1 wherein each of the regions are connected group of pixels, and wherein each region is determined to be homogenous by ensuring that the difference in intensity values between a pixel location with a maximum intensity value in the region and another pixel location with a minimum intensity value region is below a threshold.
  - 5. The method of claim 4 wherein the segmentation is sequential region growing method comprising:
    - starting with a first pixel location in the video image frame, growing a first region of connected pixels around the first pixel by adding pixels to the region such that the homogeneity criteria is satisfied;
      
      when no boundary pixels satisfy the homogeneity criteria, repeating the growing step with a pixel location outside the first region; and
      
      continuing the growing step until each of the pixels in a frame is identified as being part of a homogenous region.
  - 6. The method of claim 1 wherein the motion estimation comprises:
    - for each region identified through spatial segmentation in the video frame, performing a region based motion estimation including matching only pixels within the region with pixels in the previous frame to find a corresponding location for each of the pixels in the previous frame; and
      
      applying a motion model to approximate motion of the pixels in the region to the corresponding locations in the previous frame.
  - 7. The method of claim 6 wherein the motion model is used to find a motion vector for each region that minimizes prediction error between warped pixel values from the video frame and pixel values at the corresponding pixel locations in the previous video frame.
  - 8. The method of claim 1 wherein the determining step includes:
    - finding the number of warped pixels that are inside the boundary of the segmented video object of the previous frame;
      
      when a majority of the warped pixels lie inside the boundary of the segmented video object, classifying the region as being part of the video object in the video frame.
  - 9. A computer readable medium having instructions for performing the steps of claim 1.

10. A computer readable medium having instructions for tracking semantic objects in a vector image sequence of image frames, the medium comprising:
- a spatial segmentation module for segmenting a vector image frame in the image sequence into regions, each region comprising connected groups of image points having image values that satisfy a homogeneity criterion;
  
  a motion estimator module for estimating the motion between each of the regions in the input image frame and a reference frame and for determining a motion parameter that approximates the motion of each region between the image frame and the target frame; and
  
  a region classifier for applying the motion parameter of each region to the region to compute a predicted region in the reference frame, for evaluating whether a boundary of each predicted region falls at least partially within a boundary of a semantic object of the reference frame, and classifying each region as being part of semantic object in the reference frame based on the extent to which the predicted region falls within the boundary of a semantic object boundary of the reference frame;
  
  wherein a boundary of a semantic object in the image frame is formed from each region classified as being part of a corresponding semantic object in the reference frame.
- View Dependent Claims (11, 12, 13, 14)
- - 11. The method of claim 10 wherein the homogeneity criteria of the spatial segmentation module comprises a maximum difference value between a first image point in a connected group of pixels with a maximum image value and a second image point in the connected group with a minimum image value, and wherein the segmentation module selectively adds neighboring image points to the connected region to create a new connected region so long as the new connected region satisfies the homogeneity criteria.
  - 12. The method of claim 10 wherein the motion parameter of each region is a motion vector, which, when used to project each image point in a region into the target frame, minimizes a sum of differences between image values of the projected points and image values at corresponding image points in the target frame.
  - 13. The method of claim 10 wherein:
    - the target frame includes two or more semantic objects, each object occupying a non-overlapping area of the target frame, the region classifier identifies for each predicted region, a semantic object in the target frame having a maximum number of overlapping image points of the predicted region, the classifier classifies each region as being associated with a semantic video object in the target frame having the maximum number of overlapping image points, and the classifier computes boundaries of each semantic object in the image frame as a combination of regions classified as being associated with the corresponding semantic object in the target frame.
  - 14. The medium of claim 13 further including:
    - a majority operator for defining a structure of points around each image point in the image frame, for determining a semantic object in the image frame that has a maximal overlapped area of the structure, and for assigning a value of the semantic video object to the image point.

15. A method for tracking semantic objects in vector image sequences, the method comprising:
- performing spatial segmentation on an image frame to identify regions of discrete image points with homogenous image values;
  
  performing motion estimation between each of the regions in the image frame and a target image frame in which a boundary of a semantic object is known;
  
  using the motion estimation for each region to warp the image points in each region to locations in the target frame;
  
  determining whether the warped pixel locations of each region are within a boundary of a semantic object in the target frame and when at least a threshold amount of the region overlaps a semantic object in the target frame, classifying the region as originating from the semantic object in the target frame; and
  
  forming a boundary of the semantic object in the image frame as a combination of each of the regions in the image frame that are classified as originating from the semantic object of the target frame.
- View Dependent Claims (16, 17, 18, 19)
- - 16. The method of claim 15 further including:
    - repeating the steps of claim 15 for subsequent frames using computed boundaries of semantic objects of a previous frame to classify regions segmented in a current frame as originating from one of the semantic objects of the previous frame.
  - 17. The method of claim 15 wherein each of the regions are connected group of image points, and wherein each region is determined to be homogenous by only adding neighboring image points to the region where the difference in intensity values between a maximum and minimum image value in the region after adding each neighboring image point is below a threshold.
  - 18. The method of claim 15 wherein:
    - the target frame is the previous frame of the current frame, each region segmented from the current frame is classified as originating from exactly one semantic object previously computed for the previous frame using the steps of claim 15, boundaries for semantic objects in the current frame are computed by combining boundaries of regions classified as originating from the same semantic object in the previous frame, and the steps of claim 15 are repeated for successive frames in the vector image sequence.
  - 19. A computer readable medium having instruction for performing the steps of claim 15.

20. A method for tracking semantic objects in vector image sequences, the method comprising:
- performing spatial segmentation on an image frame to identify regions of discrete image points with homogenous image values, where each of the regions are connected group of image points, and where each region is determined to be homogenous by only adding neighboring image points to the region where the difference in intensity values between a maximum and minimum image value in the region after adding each neighboring image point is below a threshold;
  
  performing region based motion estimation between each of the regions in the image frame and an immediate previous image frame in the vector image sequence;
  
  using the motion estimated for each region to warp the image points in each region to locations in the immediate previous frame;
  
  determining whether the warped pixel locations of each region are within a boundary of a semantic object in the target frame and when at least a threshold amount of the region overlaps a semantic object in the target frame, classifying the region as originating from the semantic object in the target frame; and
  
  forming a boundary for each semantic object in the image frame as a combination of each of the regions in the image frame that are classified as originating from the semantic object of the immediate previous frame.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Lee, Ming-Chieh, Gu, Chuang

Granted Patent

US 7,088,845 B2
Time in Patent Office

Days
Field of Search
US Class Current

348/416.1
CPC Class Codes

G06T 7/11   Region-based segmentation

G06T 7/174   involving the use of two or...

G06T 7/215   Motion-based segmentation

G06T 7/246   using feature-based methods...

G06V 10/24   Aligning, centring, orienta...

H04N 19/543   using regions

H04N 19/80   Details of filtering operat...

Tracking semantic objects in vector image sequences

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

111 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Tracking semantic objects in vector image sequences

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

111 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others