Forming a representation of a video item and use thereof

US 8,503,523 B2
Filed: 06/29/2007
Issued: 08/06/2013
Est. Priority Date: 06/29/2007
Status: Expired due to Fees

First Claim

Patent Images

1. A method for forming a representation of a video item, comprising:

receiving a video item;

dividing the video item into a plurality of segments;

extracting a key frame from each of the plurality of segments to form a plurality of key frames based on an user attention feature of each frame in the plurality of segments, each user attention feature is determined based on a quantity of face images present in a corresponding frame, each key frame serving as a representation of a video segment from the plurality of segments; and

organizing the plurality of segments into one or more groups corresponding to one or more respective scenes based at least on the plurality of key frames;

identifying a final key frame from the plurality of key frames for each of the one or more groups; and

correlating a video vignette with each of the identified final key frames.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Functionality is described for forming a summary representation of a video item to help a user decide whether to obtain a full version of the video item. The functionality operates by: (a) receiving a video item; (b) dividing the video item into a plurality of segments; (c) extracting at least one key frame from each of the plurality of segments to form a plurality of key frames; and (d) organizing the video segments into one or more groups corresponding to one or more respective scenes based on the plurality of key frames, to thereby form the representation of the video item. The functionality can be used to communicate search results to a user, to provide a sample of the video item in a message, etc.

61 Citations

View as Search Results

20 Claims

1. A method for forming a representation of a video item, comprising:
- receiving a video item;
  
  dividing the video item into a plurality of segments;
  
  extracting a key frame from each of the plurality of segments to form a plurality of key frames based on an user attention feature of each frame in the plurality of segments, each user attention feature is determined based on a quantity of face images present in a corresponding frame, each key frame serving as a representation of a video segment from the plurality of segments; and
  
  organizing the plurality of segments into one or more groups corresponding to one or more respective scenes based at least on the plurality of key frames;
  
  identifying a final key frame from the plurality of key frames for each of the one or more groups; and
  
  correlating a video vignette with each of the identified final key frames.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The method of claim 1, wherein each of the plurality of segments corresponds to a portion of the video item demarcated by a camera start-recording event and a camera stop-recording event.
  - 3. The method of claim 1, wherein the dividing of the video item into a plurality of segments comprises:
    - determining a visual feature for each frame of the video item; and
      
      detecting boundaries of the segments based on the visual feature.
  - 4. The method of claim 3, wherein the visual feature relates to a manner in which a color histogram of a frame varies with respect to at least one neighboring frame.
  - 5. The method of claim 1, wherein the dividing of the video item into a plurality of segments comprises:
    - determining at least one video segment that does not meet at least one prescribed criterion; and
      
      omitting said at least one video segment.
  - 6. The method of claim 1, wherein the extracting the key frame for each of the video segments comprises, for a particular video segment:
    - determining a frame stability feature for each frame in the particular video segment, the frame stability of a frame in the particular video segment is determined by averaging a color histogram difference between neighboring frames within the particular video segment;
      
      determining a frame visual quality feature for each frame in the particular video segment, the frame visual quality of the frame in the particular video segment frame is computed based on an entropy of the color histogram for the frame; and
      
      selecting a corresponding key frame for the particular video segment based on the frame stability feature, the visual quality feature, and the user attention feature of each frame in the particular video segment.
  - 7. The method of claim 6, wherein the frame stability feature relates to a manner in which a color histogram of a single frame varies with respect to at least one single neighboring frame.
  - 8. The method of claim 6, wherein the frame visual quality feature of the frame in the particular video segment measures a contrast of the frame.
  - 9. The method of claim 6, wherein the user attention feature of the frame reflects a relative importance of subject matter in the frame to a user.
  - 10. The method of claim 1, wherein the organizing comprises:
    - forming a pairwise visual similarity feature for each pair of key frames, wherein the pairwise visual similarity feature reflects a measure of visual similarity between the pair of key frames;
      
      forming a pairwise semantic correlation feature for each pair of key frames, wherein the pairwise semantic correlation feature reflects a measure of correlation between the pair of key frames;
      
      determining a distance between each pair of consecutive videos segments based on the pairwise visual similarity feature and the pairwise semantic correlation feature, to thereby produce distance information; and
      
      organizing the video segments into said one or more groups based on the distance information.
  - 11. The method of claim 10, wherein the pairwise visual similarity feature relates to a manner in which a color histogram of a key frame varies with respect to another key frame.
  - 12. The method of claim 10, wherein the pairwise semantic correlation feature relates to a manner in which a color histogram of a key frame is correlated to a color histogram of another key frame.
  - 13. The method of claim 1, further comprising selecting the video vignette from each of the one or more groups, the video vignette comprising a portion of video taken from the video item.
  - 14. One or more machine-readable media containing machine-readable instructions for implementing the method of claim 1.
  - 15. One or more computing devices, comprising:
    - one or more processors; and
      
      memory to store computer-executable instructions that, when executed by the one or more processors, perform the method of claim 1.

16. A method comprising:
- presenting a representation of a video item comprising one or more final key frames, wherein each or the one or more final key frames is identified by;
  
  receiving the video item;
  
  dividing the video item into a plurality of segments based on visual features of multiple frames in the video item and audio features associated with the multiple frames of the video item;
  
  extracting a key frame from each of the plurality of segments to form one or more key frames based on an user attention feature of each frame in the plurality of segments, each user attention feature is determined based on a quantity of face images present in a corresponding frame, a brightness of the corresponding frame, and an amount of motion in the corresponding frame, each key frame serving as a representation of a video segment from the plurality of segments;
  
  organizing the plurality of segments into one or more groups corresponding to one or more respective scenes based on the one or more key frames;
  
  identifying a final key frame from the one or more key frames for each of the one or more groups; and
  
  correlating a video vignette with each of the identified final key frames;
  
  receiving a selection of a final key frame by a user; and
  
  presenting the video vignette correlated with the selected final key frame in response to the selection of the final key frame by the user, wherein the video vignette is selected to give the user information to make a decision as to whether to receive all of the video item.
- View Dependent Claims (17, 18, 19)
- - 17. The method of claim 16, further comprising receiving a search selection from the user, wherein the final key frame is presented to the user in response to a receipt of the search selection.
  - 18. The method of claim 16, further comprising:
    - in response to the user'"'"'s selection of the final key frame, presenting additional key frames associated with additional respective scenes of the video item; and
      
      receiving the user'"'"'s selection of an additional key frame, wherein the video vignette is presented in response to the user'"'"'s selection of the additional key frame.
  - 19. The method of claim 16, wherein the final key frame is presented in a message received from another user.

20. A system comprising:
- an audio analysis module that classifies each frame of a video item, based at least on audio features of each frame of the video item, as corresponding to a vowel, a consonant, or a pause;
  
  a video segmentation module that divides the video item into a plurality of segments at a frame that corresponds to a pause;
  
  a key frame extraction module that extracts a key frame from each of the plurality of segments to form a plurality of key frames, the key frame serving as a representation of the video segment from which it is extracted;
  
  a grouping module that organizes the video segments into one or more groups corresponding to one or more respective scenes based on the plurality of key frames, to thereby form the representation of the video item;
  
  an output generating module that correlates a video vignette with each key frame; and
  
  a video presentation module that presents each video vignette to a user in response to receipt of a user selection of one of the key frames that is correlated with the video vignette.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Williams, Hugh E., Hua, Xian-Sheng, Li, Hong-Qiao, Fan, Xiaodong, Qian, Richard
Primary Examiner(s)
JUNG, DAVID YIUK

Application Number

US11/772,101
Publication Number

US 20090007202A1
Time in Patent Office

2,230 Days
Field of Search

726/26
US Class Current

375/240.08
CPC Class Codes

G06F 16/739   in form of a video summary,...

G06F 16/7834   using audio features

G06V 20/47   Detecting features for summ...

G06V 20/49   Segmenting video sequences,...

H04N 21/23418   involving operations for an...

H04N 21/4828   for searching program descr...

H04N 21/8549   Creating video summaries, e...

H04N 7/17309   Transmission or handling of...

Forming a representation of a video item and use thereof

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

61 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Forming a representation of a video item and use thereof

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

61 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links