Space-time video montage

US 8,000,533 B2
Filed: 11/14/2006
Issued: 08/16/2011
Est. Priority Date: 11/14/2006
Status: Expired due to Fees

First Claim

Patent Images

1. A method at least partially implemented by a computing device, the method comprising:

identifying one or more arbitrary space-time volumes representing one or more informative video portions of at least one input video data sequence;

segmenting the one or more informative video portions to generate one or more volumetric saliency blobs, each of the one or more volumetric saliency blobs comprising a high saliency video portion;

dilating the one or more volumetric saliency blobs using respective one or more mask volumes to simulate spread of respective high saliency video portions of the one or more volumetric saliency blobs on respective surrounding portions of the one or more volumetric saliency blobs to form one or more volumetric saliency layers; and

generating a video summary montage of the at least one input video data sequence based on the one or more volumetric saliency layers.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods for space-time video montage are described. In one aspect, one or more arbitrary space-time volumes representing informative video portion(s) of at least one input video data sequence are identified. A video summary representing a montage of the at least one input video data sequence is generated for presentation to user from the one or more arbitrary space-time volumes.

Citations

20 Claims

1. A method at least partially implemented by a computing device, the method comprising:
- identifying one or more arbitrary space-time volumes representing one or more informative video portions of at least one input video data sequence;
  
  segmenting the one or more informative video portions to generate one or more volumetric saliency blobs, each of the one or more volumetric saliency blobs comprising a high saliency video portion;
  
  dilating the one or more volumetric saliency blobs using respective one or more mask volumes to simulate spread of respective high saliency video portions of the one or more volumetric saliency blobs on respective surrounding portions of the one or more volumetric saliency blobs to form one or more volumetric saliency layers; and
  
  generating a video summary montage of the at least one input video data sequence based on the one or more volumetric saliency layers.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The method of claim 1, wherein the video summary montage represents a spatial scale-down of the at least one input video data sequence, a temporal scale-down of the input video data sequence, or a space-time scale down of the input video data sequence.
  - 3. The method of claim 1, wherein the at least one input video data sequence is multiple input video data sequences, each input video data sequence representing a different video clip.
  - 4. The method of claim 1, wherein the method further comprises:
    - analyzing spatio-temporal information distribution in the at least one input video data sequence to locate the one or more arbitrary space-time volumes;
      
      wherein the video summary montage is based on an arrangement of at least a subset of the information associated with the one or more volumetric saliency layers.
  - 5. The method of claim 1, wherein the method further comprises presenting the video summary montage to a user.
  - 6. The method of claim 1, wherein the method further comprises:
    - identifying, for each volumetric saliency layer, relative saliency distributions of pixels in the saliency layer; and
      
      wherein the video summary montage is generated based on the relative saliency distributions of pixels across the one or more volumetric saliency layers.
  - 7. The method of claim 6, wherein identifying the relative saliency distributions further comprises, with respect to the at least one input video data sequence, locating high saliency portions, mid-level saliency portions, and low-level-saliency portions.
  - 8. The method of claim 6, wherein identifying the relative saliency distributions further comprises dilating high-saliency portions of the at least one input video data sequence to simulate effect of the high-saliency portions on adjacent pixels.
  - 9. The method of claim 6, wherein identifying the relative saliency distributions further comprises assigning negative saliency values to respective ones of the pixels in a least a subset of the one or more volumetric saliency layers, the negative saliency values indicating preference of higher salient portions of pixels in one or more different layers of the one or more volumetric saliency layers.
  - 10. The method of claim 1, wherein the method further comprises:
    - identifying, for each volumetric saliency layer, relative saliency distributions of pixels in the saliency layer; and
      
      packing and merging data associated with the one or more volumetric saliency layers into a 3-D video volume based on the relative saliency distributions of pixels in the one or more volumetric saliency layers, the 3-D video volume being the video summary montage.
  - 11. The method of claim 10, wherein each pixel in the 3-D video volume is associated with a respective saliency measurement.
  - 12. The method of claim 10, wherein at least one of the one or more volumetric saliency layers that have been packed and merged into the 3-D video volume overlap one or more different ones of the one or more volumetric saliency layers.
  - 13. The method of claim 10, wherein the packing is based on a first-fit algorithm and wherein the merging is based on a graph cut optimization algorithm.
  - 14. The method of claim 10, wherein the packing further comprises positioning the one or more volumetric saliency layers in the 3-D video volume to maximize overall saliency in the 3-D video volume based on the relative saliency distributions of pixels in the one or more volumetric saliency layers.
  - 15. The method of claim 10, wherein the merging is based on a set of soft constraints that maximize saliency, maximize continuity of high-saliency portions, and maximize color smoothness at scene boundaries of the one or more volumetric saliency layers.

16. A computing device comprising:
- one or more processors; and
  
  memory coupled to the processor, the memory storing computer-program instructions executable by the one or more processors, the computer-program instructions when executed by the one or more processors performing operations comprising;
  
  extracting visually informative space-time portions from video frames of an input video data sequence, the informative space-time portions including spatio-temporal saliency measuring salient texture of the visually informative space-time portions on each of the video frames;
  
  segmenting the visually informative space-time portions to obtain volumetric saliency layers, each volumetric saliency layer including a single saliency portion of the visually information space-time portions;
  
  positioning at least a subset of the volumetric saliency layers into a 3-D video volume to maximize saliency of pixels in the 3-D video volume; and
  
  merging data associated with the at least subset of the volumetric saliency layers in the 3-D video volume to regulate continuity of high-saliency portions of the pixels and provide color coherence at boundaries between respective ones of pixels in the volumetric saliency layers, wherein the 3-D video volume represents a video summary montage of the input video data sequence.
- View Dependent Claims (17, 18, 19)
- - 17. The computing device of claim 16, wherein the method further comprises presenting the video summary montage to a user.
  - 18. The computing device of claim 16, wherein the segmenting comprises:
    - locating in the visually informative space-time portions, high-level saliency portions, mid-level saliency portions, and low-level saliency portions;
      
      dilating the high-level saliency portions of the visually informative space-time portions to simulate effect of the high-level saliency portions on adjacent pixels; and
      
      assigning negative saliency values to respective ones of the pixels in the at least subset of the volumetric saliency layers, the negative saliency values indicating preference of higher salient portions of pixels in one or more different layers of the volumetric saliency layers.
  - 19. The computing device of claim 16, wherein one or more of the volumetric saliency layers overlap one or more different ones of the volumetric saliency layers in the 3-D video volume.

20. A computing device comprising:
- a processor; and
  
  a memory coupled to the processor, the memory comprising computer-program instructions executable by the processor, the computer-program instructions when executed by the processor for performing operations comprising;
  
  receiving an input video data sequence;
  
  assigning saliency values to each pixel of the input video data sequence to obtain a saliency volume, the saliency volume comprising one or more spatio-temporal informative video portions of the input video data sequence;
  
  segmenting the one or more spatio-temporal informative video portions to generate one or more volumetric saliency blobs, each of the one or more volumetric saliency blobs comprising a set of pixels representing a high saliency video portion;
  
  dilating the one or more volumetric saliency blobs using respective one or more mask volumes to simulate spread of respective high saliency video portions on respective surrounding portions to form at least a first volumetric saliency layer and a second volumetric saliency layer;
  
  for the first volumetric saliency layer, assigning positive saliency values to locations corresponding to the high saliency portions of the first volumetric saliency layer, and assigning negative values to locations corresponding to the high saliency portions of the second volumetric saliency layer, wherein the negative values are used to reduce the importance of the high saliency portions of the second volumetric saliency layer in the first volumetric saliency layer;
  
  positioning at least a subset of information associated with the first and second volumetric saliency layers into a 3-D video volume to maximize saliency of pixels in the 3-D video volume;
  
  merging data associated with the at least a subset of the information in the 3-D video volume to regulate continuity of high-saliency portions of the pixels and provide color coherence at boundaries between respective ones of pixels in the one or more volumetric saliency layers;
  
  and presenting the 3-D volume as a video summary of the input video data sequence.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Kang, Hong-Wen, Tang, Xiaoou, Matsushita, Yasuyuki
Primary Examiner(s)
Ahmed; Samir
Assistant Examiner(s)
Li; Ruiping

Application Number

US11/559,852
Publication Number

US 20080112684A1
Time in Patent Office

1,736 Days
Field of Search

382/224, 382/305, 375/240.29, 375/E7.166
US Class Current

382/190
CPC Class Codes

G06F 16/739   in form of a video summary,...

G06F 16/785   using colour or luminescence

G06V 10/462   Salient features, e.g. scal...

G06V 20/47   Detecting features for summ...

G11B 27/034   on discs G11B27/036, G11B27...

G11B 27/28   by using information signal...

Space-time video montage

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Space-time video montage

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links