Semantic video object segmentation and tracking

US 6,400,831 B2
Filed: 04/02/1998
Issued: 06/04/2002
Est. Priority Date: 04/02/1998
Status: Expired due to Term

First Claim

Patent Images

1. A method of semantic object tracking of an object in a sequence of video frames, the method comprising:

for an approximate boundary near a border of an object in a first video frame, defining an inner boundary inside of the approximate boundary;

defining an outer boundary outside of the approximate boundary; and

expanding the inner boundary while contracting the outer boundary to converge upon the border of the object in the first video frame;

determining a motion transformation function representing the transformation of the object between the first video frame and a second video frame, wherein the determining comprises matching the object in the first video frame to the object in the second video frame; and

defining a new approximate boundary for the object in the second video frame based upon the motion transformation function.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A semantic video object extraction system using mathematical morphology and perspective motion modeling. A user indicates a rough outline around an image feature of interest for a first frame in a video sequence. Without further user assistance, the rough outline is processed by a morphological segmentation tool to snap the rough outline into a precise boundary surrounding the image feature. Motion modeling is performed on the image feature to track its movement into a subsequent video frame. The motion model is applied to the precise boundary to warp the precise outline into a new rough outline for the image feature in the subsequent video frame. This new rough outline is then snapped to locate a new precise boundary. Automatic processing is repeated for subsequent video frames.

Citations

26 Claims

1. A method of semantic object tracking of an object in a sequence of video frames, the method comprising:
- for an approximate boundary near a border of an object in a first video frame, defining an inner boundary inside of the approximate boundary;
  
  defining an outer boundary outside of the approximate boundary; and
  
  expanding the inner boundary while contracting the outer boundary to converge upon the border of the object in the first video frame;
  
  determining a motion transformation function representing the transformation of the object between the first video frame and a second video frame, wherein the determining comprises matching the object in the first video frame to the object in the second video frame; and
  
  defining a new approximate boundary for the object in the second video frame based upon the motion transformation function.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1 further comprising:
3. A computer readable medium having stored therein computer-executable instructions for causing a computer programmed thereby to perform the method of claim 1.
4. The method of claim 1 wherein the defining the new approximate boundary includes, for each of plural pixels of the object in the second video frame:
- applying the inverse of the motion transformation function to the pixel to determine a corresponding pixel in the first video frame;
  
  based upon proximity of the corresponding pixel to the border in the first video frame, determining whether the pixel of the object in the second video frame is part of the new approximate boundary for the object in the second video frame.

5. A computer readable medium having stored therein computer-executable instructions for causing a computer programmed thereby to perform a method of identifying an object in a sequence of video frames, the method comprising:
- based upon input received from a user, defining an approximate boundary near a border of an object in a first video frame;
  
  automatically defining an inner boundary inside of the approximate boundary;
  
  automatically defining an outer boundary outside of the approximate boundary; and
  
  expanding the inner boundary and contracting the outer boundary to identify the border of the object in the first video frame, the identified border at the convergence of the inner boundary and the outer boundary.
- View Dependent Claims (6, 7, 8, 9, 10, 11)
- - 6. The computer readable medium of claim 5 wherein movement of the object is tracked across the sequence of video frames, further comprising:
7. The computer readable medium of claim 6, wherein the method further comprises:
- repeating the method of claim 6 for the second approximate boundary, including;
  
  determining a second transformation between the second video frame and a third video frame; and
  
  based upon the second transformation, defining a third approximate boundary for the object in the third video frame.
8. The computer readable medium of claim 5 wherein the expanding the inner boundary and contracting the outer boundary includes:
- sampling pixels within the object to define at least one inside cluster-center pixel represented in a multi-valued format;
  
  sampling pixels outside of the object to define at least one outside cluster-center pixel represented in the multi-valued format; and
  
  for each of plural pixels between the inner boundary and the outer boundary, classifying the pixel according to similarity to one of the at least one inside cluster-center and at least one outside cluster-center.
9. The computer readable medium of claim 8 wherein the classifying is pixel-wise.
10. The computer readable medium of claim 8 wherein the classifying is morphological watershed-based.
11. The computer readable medium of claim 5 wherein E is a morphological erosion operator, wherein O is a morphological dilation operator, wherein B_initis the approximate boundary, wherein the defining the inner boundary satisfies the morphological relation B_in=E(B_init), and wherein the defining the outer boundary satisfies the morphological relation B_out=O(B_init).

12. A method of tracking motion of an object across multiple video frames, the method comprising:
- defining a first boundary approximating a perimeter of an object in a first video frame;
  
  determining a global motion transformation indicating movement of the object between the first video frame and a second video frame; and
  
  applying the global motion transformation to define a second boundary approximating the perimeter of the object in the second video frame.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21)
- - 13. The method of claim 12, further comprising:
14. The method of claim 13 further including:
- based upon the second boundary, automatically identifying the perimeter of the object in the second frame;
  
  computing an error value for the automatically identified perimeter of the object in the second frame; and
  
  if the error value exceeds a predetermined threshold, prompting a user to identify the perimeter of the object in the second frame.
15. The method of claim 12 wherein non-rigid motion is tracked across multiple video frames by determining a global motion transformation for movement of the object and by identifying a local motion transformation for movement of at least one sub-object within the object.
16. A computer readable medium having stored therein computer-executable instructions for causing a computer programmed thereby to perform the method of claim 12.
17. The method of claim 12 wherein the applying the global motion transformation includes, for each of plural pixels of the object in the second video frame:
- applying the inverse of the global motion transformation to the pixel to determine a corresponding pixel in the first video frame;
  
  if the corresponding pixel is within proximity of the perimeter of the object in the first video frame, classifying the pixel of the object in the second video frame as being part of the second boundary.
18. The method of claim 17 wherein a first corresponding pixel has non-integer coordinates, and wherein the first corresponding pixel is within the proximity of the perimeter of the object in the first video frame if at least one neighboring integer-coordinate pixel is part of the perimeter of the object in the first video frame.
19. The method of claim 12 wherein the determining the global motion transformation includes comparing color information within the object in the first video frame to color information within each of plural prospective matching objects in the second video frame, thereby tracking evolution of color information within the object.
20. The method of claim 12 wherein the determining the global motion transformation includes solving for plural parameters of the global motion transformation by approximating minimum error between color information within the object in the first video frame and color information within each of plural prospective matching objects in the second video frame.
21. The method of claim 20 wherein the plural parameters initialize a second global motion transformation indicating movement of the object between the second video frame and a third video frame.

22. A computer readable medium having stored therein computer-executable instructions for causing a computer programmed thereby to perform a method of automatically tracking a segmented, semantic video object in a video sequence, the method comprising:
- segmenting a semantic video object within a first video frame of a video sequence, a boundary defining the semantic video object within the first video frame, wherein a user at least in part guides placement of the boundary based upon a semantic criterion; and
  
  automatically tracking the semantic video object in one or more subsequent video frames based upon a global motion model of the semantic video object.
- View Dependent Claims (23, 24, 25)
- - 23. The computer readable medium of claim 22 wherein the segmenting further comprises:
24. The computer readable medium of claim 22 further comprising:
- requesting user assistance in a second segmenting operation if an error value for said automatically tracking exceeds an error threshold in a subsequent video frame.
25. The computer readable medium of claim 22 wherein the automatic tracking comprises iteratively:
- determining the global motion model of the semantic video object;
  
  based upon the global motion model, defining a boundary approximation for the semantic video object in one of the one or more subsequent video frames; and
  
  automatically refining the boundary approximation.

26. A system for tracking a semantic video object through a sequence of video frames, the sequence including one or more I-frames and one or more P-frames, the system comprising code for:
- segmenting a semantic video object within a first I-frame, wherein a boundary segments the semantic video object within the first I-frame;
  
  tracking the semantic video object automatically into one or more subsequent frames, wherein the tracking for each of the one or more subsequent frames comprises;
  
  estimating global motion of the semantic video object into the subsequent frame;
  
  if a tracking error for the global motion estimation satisfies a tracking error threshold, determining the boundary for the semantic video object in the subsequent frame as a P-frame; and
  
  if the tracking error fails the tracking error threshold, determining the boundary for the semantic video object in the subsequent frame as a second I-frame.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Lee, Ming-Chieh, Gu, Chuang
Primary Examiner(s)
Johns, Andrew W.
Assistant Examiner(s)
Tabatabai, Abolfazl

Application Number

US09/054,280
Publication Number

US 20010048753A1
Time in Patent Office

1,524 Days
Field of Search

382/103, 382/128, 382/308, 382/133, 382/258, 382/199, 382/236, 382/256, 382/257, 340/990
US Class Current

382/103
CPC Class Codes

G06T 2207/10016   Video; Image sequence

G06T 2207/20104   Interactive definition of r...

G06T 7/11   Region-based segmentation

G06T 7/155   involving morphological ope...

G06T 7/215   Motion-based segmentation

Semantic video object segmentation and tracking

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

Semantic video object segmentation and tracking

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links