COMPUTERIZED MACHINE LEARNING OF INTERESTING VIDEO SECTIONS

US 20160034786A1
Filed: 07/29/2014
Published: 02/04/2016
Est. Priority Date: 07/29/2014
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving, by one or more computing devices, video data;

extracting, by at least one of the one or more computing devices, a plurality of features from the video data;

determining, by at least one of the one or more computing devices, a first set of feature values associated with the plurality of features, the first set of feature values for training a classifier and a scoring model;

determining, by at least one of the one or more computing devices, a second set of feature values based on applying the classifier to the video data;

training, by at least one of the one or more computing devices, the scoring model based on the first set of feature values and the second set of feature values, wherein the scoring model determines a desirability score associated with the video data.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

This disclosure describes techniques for training models from video data and applying the learned models to identify desirable video data. Video data may be labeled to indicate a semantic category and/or a score indicative of desirability. The video data may be processed to extract low and high level features. A classifier and a scoring model may be trained based on the extracted features. The classifier may estimate a probability that the video data belongs to at least one of the categories in a set of semantic categories. The scoring model may determine a desirability score for the video data. New video data may be processed to extract low and high level features, and feature values may be determined based on the extracted features. The learned classifier and scoring model may be applied to the feature values to determine a desirability score associated with the new video data.

141 Citations

View as Search Results

20 Claims

1. A method comprising:
- receiving, by one or more computing devices, video data;
  
  extracting, by at least one of the one or more computing devices, a plurality of features from the video data;
  
  determining, by at least one of the one or more computing devices, a first set of feature values associated with the plurality of features, the first set of feature values for training a classifier and a scoring model;
  
  determining, by at least one of the one or more computing devices, a second set of feature values based on applying the classifier to the video data;
  
  training, by at least one of the one or more computing devices, the scoring model based on the first set of feature values and the second set of feature values, wherein the scoring model determines a desirability score associated with the video data.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein the plurality of features includes low level features and high level features.
  - 3. The method of claim 2, wherein the first set of feature values represents:
    - one or more low level feature values associated with the low level features;
      
      one or more high level feature values associated with the high level features; and
      
      a plurality of derivative feature values, wherein individual derivative feature values of the plurality of derivative feature values are derived from at least some of the one or more low level feature values or one or more high level feature values.
  - 4. The method of claim 1, wherein the second set of feature values represents probabilities that the video data belongs to at least one semantic category of a predefined set of semantic categories.
  - 5. The method of claim 1, wherein the video data comprises a video collection, a video segment, or a video frame.
  - 6. The method of claim 1, wherein the first set of feature values represents at least one of video collection level feature values, video file level feature values, video segment level feature values, or video frame level feature values and the second set of feature values represents video file level feature values.

7. A system comprising:
- memory;
  
  one or more processors; and
  
  one or more modules stored in the memory and executable by the one or more processors, the one or more modules including;
  
  an extracting module configured for extracting features from video data and determining a first set of feature values based on the extracted features; and
  
  a ranking module configured for determining a desirability score for the video data, the ranking module including;
  
  a classifying module configured for applying a classifier to the first set of feature values to determine a second set of feature values; and
  
  a scoring module configured for applying a scoring model to the first set of feature values and the second set of feature values to determine a desirability score for the video data.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The system of claim 7, wherein the features include at least one of:
    - exposure quality;
      
      saturation quality;
      
      hue variety;
      
      stability;
      
      face detection;
      
      face recognition;
      
      face tracking;
      
      saliency analysis;
      
      audio power analysis;
      
      speech detection; and
      
      motion analysis.
  - 9. The system of claim 7, wherein the first set of feature values represents:
    - one or more feature values associated with the features; and
      
      a plurality of derivative feature values, wherein individual derivative feature values of the plurality of derivative feature values are derived from the one or more feature values.
  - 10. The system of claim 7, wherein the second set of feature values represents probabilities that the video data belongs to at least one semantic category of a predefined set of semantic categories
  - 11. The system of claim 7, wherein the one or more modules further include a segmenting module configured for:
    - identifying video segments in the video data based at least in part on the video segments having desirability scores above a predetermined threshold; and
      
      detecting boundaries in the video data associated with the video segments.
  - 12. The system of claim 11, wherein the one or more modules further include a post-processing module configured for:
    - ranking the video data based at least in part on the desirability scores for the video data;
      
      orcreating a highlight video based at least in part on the detecting the boundaries in the video data and adding transitions between the video segments having the desirability scores above the predetermined threshold.

13. One or more computer-readable storage media encoded with instructions that, when executed by a processor, perform acts comprising:
- receiving video data including a plurality of video frames;
  
  extracting a plurality of features from individual video frames of the plurality of video frames to determine a first set of feature values associated with the individual video frames;
  
  applying a classifier to the first set of feature values to determine a second set of feature values associated with the individual video frames, wherein the second set of feature values represents probabilities that the individual video frames belong to at least one semantic category of a predefined set of semantic categories; and
  
  applying a scoring model to the first set of feature values and the second set of feature values to determine a desirability score associated with the individual video frames.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
- - 14. The computer-readable storage media of claim 13, wherein the first set of feature values represents feature values associated with low level features, high level features, and derivatives of the low level features and high level features.
  - 15. The computer-readable storage media of claim 13, wherein the video data comprises video files and the acts further comprise determining a desirability score for individual video files based on an average desirability score associated with individual video frames associated with the individual video file.
  - 16. The computer-readable storage media of claim 15, wherein the acts further comprise ranking the individual video files based at least in part on the desirability scores for the video files.
  - 17. The computer-readable storage media of claim 13, wherein the acts further comprise:
    - identifying video segments having desirability scores above a predetermined threshold, wherein the video segments include two or more of the individual video frames having desirability scores above the predetermined threshold; and
      
      detecting boundaries in the video data associated with the video segments.
  - 18. The computer-readable storage media of claim 17, wherein the detecting boundaries associated with the video segments comprises:
    - analyzing the individual video frames for motion data;
      
      detecting a camera motion in the motion data including at least one of;
      
      pan to left;
      
      pan to right;
      
      pan to top;
      
      pan to bottom;
      
      zoom in;
      
      orzoom out; and
      
      identifying the boundaries associated with the video segments based at least in part on a change in camera motion between a first individual video frame of the individual video frames and a second individual video frame of the individual video frames.
  - 19. The computer-readable storage media of claim 17, wherein the detecting boundaries associated with the video segments comprises:
    - analyzing the individual video frames for motion data;
      
      determining object motion intensities for the individual video frames based on the motion data; and
      
      detecting the boundaries when an object motion intensity of a first video frame of the individual video frames and an object motion intensity of a second video frame of the individual video frames differ by a predetermined threshold.
  - 20. The computer-readable storage media of claim 17, wherein the acts further comprise creating a new video file including the video segments having desirability scores above a predetermined threshold by manipulating the video data based on the detecting of the boundaries in the video data and adding transitions between the video segments having desirability scores above the predetermined threshold.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Suri, Nitin, Hua, Xian-Sheng, Wang, Tzong-Jhy, Sproule, William D., Ivory, Andrew S., Li, Jin

Granted Patent

US 9,646,227 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 18/214   Generating training pattern...

G06T 2207/10016   Video; Image sequence

G06T 7/20   Analysis of motion motion e...

G06V 10/774   Generating sets of training...

G06V 20/46   Extracting features or char...

COMPUTERIZED MACHINE LEARNING OF INTERESTING VIDEO SECTIONS

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

141 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

COMPUTERIZED MACHINE LEARNING OF INTERESTING VIDEO SECTIONS

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

141 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links