Computerized machine learning of interesting video sections

US 9,646,227 B2
Filed: 07/29/2014
Issued: 05/09/2017
Est. Priority Date: 07/29/2014
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving, by one or more computing devices, video data;

extracting, by at least one of the one or more computing devices, a plurality of features from the video data;

determining, by at least one of the one or more computing devices, a first set of feature values associated with the plurality of features, the first set of feature values for training a classifier and a scoring model;

determining, by at least one of the one or more computing devices, a second set of feature values based on applying the classifier to the video data;

training, by at least one of the one or more computing devices, the scoring model based on the first set of feature values and the second set of feature values;

using the scoring model to determine a plurality of desirability scores associated with the video data, wherein an individual desirability score indicative of video quality is associated with an individual video frame in the video data;

identifying video frames in the video data that have a desirability score above a predetermined threshold desirability score;

analyzing the video data to determine, in association with the video frames, changes in camera motion and changes in object motion; and

locating, based at least in part on the changes in camera motion and the changes in object motion, boundaries in the video data to produce one or more video segments, wherein;

an individual video segment includes at least one video frame that has the desirability score above the predetermined threshold desirability score, andthe locating the boundaries in the video data comprises determining that object motion intensity of a first video frame and object motion intensity of a second video frame differ by a predetermined threshold.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

This disclosure describes techniques for training models from video data and applying the learned models to identify desirable video data. Video data may be labeled to indicate a semantic category and/or a score indicative of desirability. The video data may be processed to extract low and high level features. A classifier and a scoring model may be trained based on the extracted features. The classifier may estimate a probability that the video data belongs to at least one of the categories in a set of semantic categories. The scoring model may determine a desirability score for the video data. New video data may be processed to extract low and high level features, and feature values may be determined based on the extracted features. The learned classifier and scoring model may be applied to the feature values to determine a desirability score associated with the new video data.

Citations

17 Claims

1. A method comprising:
- receiving, by one or more computing devices, video data;
  
  extracting, by at least one of the one or more computing devices, a plurality of features from the video data;
  
  determining, by at least one of the one or more computing devices, a first set of feature values associated with the plurality of features, the first set of feature values for training a classifier and a scoring model;
  
  determining, by at least one of the one or more computing devices, a second set of feature values based on applying the classifier to the video data;
  
  training, by at least one of the one or more computing devices, the scoring model based on the first set of feature values and the second set of feature values;
  
  using the scoring model to determine a plurality of desirability scores associated with the video data, wherein an individual desirability score indicative of video quality is associated with an individual video frame in the video data;
  
  identifying video frames in the video data that have a desirability score above a predetermined threshold desirability score;
  
  analyzing the video data to determine, in association with the video frames, changes in camera motion and changes in object motion; and
  
  locating, based at least in part on the changes in camera motion and the changes in object motion, boundaries in the video data to produce one or more video segments, wherein;
  
  an individual video segment includes at least one video frame that has the desirability score above the predetermined threshold desirability score, andthe locating the boundaries in the video data comprises determining that object motion intensity of a first video frame and object motion intensity of a second video frame differ by a predetermined threshold.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein:
    - the plurality of features includes low level features and high level features;
      
      the first set of feature values includes at least a low level feature value and a first high level feature value; and
      
      the second set of feature values includes at least one second high level feature value that is not included in the first set of feature values.
  - 3. The method of claim 2, wherein the first set of feature values represents a plurality of derivative feature values, wherein individual derivative feature values of the plurality of derivative feature values are derived from at least some low level feature values or high level feature values.
  - 4. The method of claim 1, wherein the second set of feature values represents probabilities that the video data belongs to at least one semantic category of a predefined set of semantic categories.
  - 5. The method of claim 1, wherein the video data comprises a video collection.
  - 6. The method of claim 1, wherein the first set of feature values represents at least one of video collection level feature values, video file level feature values, video segment level feature values, or video frame level feature values and the second set of feature values represents video file level feature values.

7. A system comprising:
- memory;
  
  one or more processors; and
  
  one or more modules stored in the memory and executable by the one or more processors, the one or more modules including;
  
  an extracting module configured to extract features from video data and determine a first set of feature values based on the extracted features;
  
  a classifying module configured to apply a classifier to the first set of feature values to determine a second set of feature values and to use the second set of feature values to determine a probability that the video data belongs to at least one semantic category of a predefined set of semantic categories;
  
  a scoring module configured to apply a scoring model, based at least in part on the at least one semantic category, to the first set of feature values and the second set of feature values to determine a plurality of desirability scores for the video data, wherein an individual desirability score indicative of video quality is associated with an individual video frame in the video data; and
  
  a segmenting module configured to;
  
  identify video frames in the video data that have a desirability score above a predetermined threshold desirability score;
  
  analyze the video data to determine, in association with the video frames, changes in camera motion and changes in object motion; and
  
  locate, based at least in part on the changes in camera motion and the changes in object motion, boundaries in the video data to produce one or more video segments, wherein;
  
  an individual video segment includes at least one video frame that has the desirability score above the predetermined threshold desirability score, andthe locating the boundaries in the video data comprises determining that object motion intensity of a first video frame and object motion intensity of a second video frame differ by a predetermined threshold.
- View Dependent Claims (8, 9, 16)
- - 8. The system of claim 7, wherein the features include at least one of:
    - exposure quality;
      
      saturation quality;
      
      hue variety;
      
      stability;
      
      face detection;
      
      face recognition;
      
      face tracking;
      
      saliency analysis;
      
      audio power analysis;
      
      speech detection;
      
      ormotion analysis.
  - 9. The system of claim 7, wherein the one or more modules further include a post-processing module configured to:
    - rank the one or more video segments based at least in part on desirability scores of video frames included in an individual video segment; and
      
      create a highlight video based at least in part on the ranking.
  - 16. The system of claim 7, wherein:
    - the features include low level features and high level features;
      
      the first set of feature values includes at least a low level feature value and a first high level feature value; and
      
      the second set of feature values includes at least one second high level feature value that is not included in the first set of feature values.

10. One or more computer-readable storage media encoded with instructions that, when executed by a processor, perform acts comprising:
- receiving video data including a plurality of video frames;
  
  extracting a plurality of features from individual video frames of the plurality of video frames to determine a first set of feature values associated with the individual video frames;
  
  applying a classifier to the first set of feature values to determine a second set of feature values associated with the individual video frames;
  
  using the second set of feature values to determine individual probabilities that the individual video frames belong to at least one semantic category of a predefined set of semantic categories;
  
  applying a scoring model to the first set of feature values and the second set of feature values to determine desirability scores associated with the individual video frames;
  
  identifying a subset of the plurality of video frames in the video data that have a desirability score above a predetermined threshold desirability score;
  
  analyzing the video data to determine, in association with the subset of video frames, changes in camera motion and changes in object motion; and
  
  locating, based at least in part on the changes in camera motion and the changes in object motion, boundaries in the video data to produce one or more video segments, wherein;
  
  an individual video segment includes at least one video frame that has the desirability score above the predetermined threshold desirability score, andthe locating the boundaries in the video data comprises determining that object motion intensity of a first video frame and object motion intensity of a second video frame differ by a predetermined threshold.
- View Dependent Claims (11, 12, 13, 14, 15, 17)
- - 11. The computer-readable storage media of claim 10, wherein the first set of feature values represents feature values associated with low level features, high level features, and derivatives of the low level features and high level features.
  - 12. The computer-readable storage media of claim 10, wherein the video data comprises video files and the acts further comprise determining a desirability score for individual video files based on an average desirability score associated with individual video frames associated with the individual video file.
  - 13. The computer-readable storage media of claim 12, wherein the acts further comprise ranking the individual video files based at least in part on the desirability scores for the video files.
  - 14. The computer-readable storage media of claim 10, wherein the camera motion comprises at least one of:
    - pan to left;
      
      pan to right;
      
      pan to top;
      
      pan to bottom;
      
      zoom in;
      
      orzoom out.
  - 15. The computer-readable storage media of claim 10, wherein the acts further comprise creating a new video file including the video segments having desirability scores above a predetermined threshold by manipulating the video data based on the detecting of the boundaries in the video data and adding transitions between the video segments having desirability scores above the predetermined threshold.
  - 17. The computer-readable storage media of claim 10, wherein:
    - the plural of features includes low level features and high level features;
      
      the first set of feature values includes at least a low level feature value and a first high level feature value; and
      
      the second set of feature values includes at least one second high level feature value that is not included in the first set of feature values.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Suri, Nitin, Hua, Xian-Sheng, Wang, Tzong-Jhy, Sproule, William D., Ivory, Andrew S., Li, Jin
Primary Examiner(s)
Kholdebarin, Iman K

Application Number

US14/445,463
Publication Number

US 20160034786A1
Time in Patent Office

1,015 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 18/214   Generating training pattern...

G06T 2207/10016   Video; Image sequence

G06T 7/20   Analysis of motion motion e...

G06V 10/774   Generating sets of training...

G06V 20/46   Extracting features or char...

Computerized machine learning of interesting video sections

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Computerized machine learning of interesting video sections

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links