Video summarization using semantic information

US 10,229,324 B2
Filed: 12/24/2015
Issued: 03/12/2019
Est. Priority Date: 12/24/2015
Status: Active Grant

First Claim

Patent Images

1. An apparatus, comprising:

a processor to segment an incoming video stream into a plurality of activity segments, wherein each frame is associated with an activity;

the processor to execute a scoring mechanism to calculate a score for each frame of each activity, wherein the score is based, at least partially, on a classification probability of each frame and a co-occurrence of a plurality of objects with each activity; and

the processor to execute a summarizer to summarize the activity segments based on the score for each frame and select a high score region of a frame of the summarized activity segment, wherein the high score region represents a most important and salient moment within the summarized activity segment.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An apparatus for video summarization using semantic information is described herein. The apparatus includes a controller, a scoring mechanism, and a summarizer. The controller is to segment an incoming video stream into a plurality of activity segments, wherein each frame is associated with an activity. The scoring mechanism is to calculate a score for each frame of each activity, wherein the score is based on a plurality of objects in each frame. The summarizer is to summarize the activity segments based on the score for each frame.

Citations

22 Claims

1. An apparatus, comprising:
- a processor to segment an incoming video stream into a plurality of activity segments, wherein each frame is associated with an activity;
  
  the processor to execute a scoring mechanism to calculate a score for each frame of each activity, wherein the score is based, at least partially, on a classification probability of each frame and a co-occurrence of a plurality of objects with each activity; and
  
  the processor to execute a summarizer to summarize the activity segments based on the score for each frame and select a high score region of a frame of the summarized activity segment, wherein the high score region represents a most important and salient moment within the summarized activity segment.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The apparatus of claim 1, wherein the activities are segmented by one or more shot boundaries and each frame is labeled according to an activity class.
  - 3. The apparatus of claim 1, wherein a convolutional neural network is used to classify each segment into an activity.
  - 4. The apparatus of claim 3, wherein frames with mislabeled activities are relabeled according to the activities of surrounding frames.
  - 5. The apparatus of claim 1, wherein segments lower than a predefined threshold in length are discarded.
  - 6. The apparatus of claim 1, wherein the scoring mechanism determines a score that is the probability that a frame belongs to an activity based on objects in the frame.
  - 7. The apparatus of claim 1, wherein the scoring mechanism determines a score that is the probability that an object of a frame belongs to a class of objects combined with an importance of the object for the activity assigned to the frame.
  - 8. The apparatus of claim 1, wherein the summarizer is to create a summary by adding frames to the summary with scores above a predefined threshold.
  - 9. The apparatus of claim 1, wherein the summary is generated by selecting key image frames that correspond to the highest score for each segment, or key clips of N-seconds are selected for each segment.

10. A method for video summarization, comprising:
- labeling each frame of a plurality of frames according to an activity class;
  
  determining an object-to-activity correlation for an object within each frame, wherein the object-to-activity correlation is based, at least partially, on a classification probability of each frame and a co-occurrence of a plurality of objects with each activity, wherein an object is extracted from a frame via frame analysis using a convolutional neural network with tiled neurons; and
  
  rendering a video summary that comprises the frames with object-to-activity correlations above a predetermined threshold for each frame in a shot boundary.
- View Dependent Claims (11, 12, 13, 14, 15)
- - 11. The method of claim 10, wherein object to activity correlation is obtained by executing both an activity classifier and object detector on a set of training data, by human annotation, or by any combination thereof.
  - 12. The method of claim 10, wherein an importance of an object for an activity is directly proportional to a value of the number of frames in an activity that contains the object divided by the total number of frames for that activity and is used to determine the object to activity correlation.
  - 13. The method of claim 10, wherein a convolutional neural network is used to label each frame according to an activity class.
  - 14. The method of claim 10, wherein a fast, regional convolutional neural network is used to classify a plurality of objects of each frame.
  - 15. The method of claim 10, wherein a probability that the object belongs to a particular activity is used to determine, at least partially, the object to activity correlation.

16. A system, comprising:
- a display;
  
  an image capture mechanism;
  
  a memory that is to store instructions and that is communicatively coupled to the image capture mechanism and the display; and
  
  a processor communicatively coupled to the image capture mechanism, the display, and the memory, wherein when the processor is to execute the instructions, the processor is to;
  
  label each frame of a plurality of frames according to an activity class;
  
  determine a score corresponding to each frame wherein the score is based, at least partially, on a classification probability of each frame and a co-occurrence of a plurality of objects with each activity wherein an object is extracted from a frame via frame analysis using a convolutional neural network with tiled neurons; and
  
  render a video summary that comprises the frames with scores above a predetermined threshold for each frame in a shot boundary.
- View Dependent Claims (17, 18, 19)
- - 17. The system of claim 16, wherein the score is based on, at least partially, an activity-to-object co-occurrence of an object within each frame.
  - 18. The system of claim 16, wherein a convolutional neural network is used to label each frame of the plurality of frames according to an activity class.
  - 19. The system of claim 16, wherein frames with mislabeled activity classes are relabeled according to the activities of surrounding frames.

20. A tangible, non-transitory, computer-readable medium comprising instructions that, when executed by a processor, direct the processor to:
- label each frame of a plurality of frames according to an activity class;
  
  determine an object-to-activity correlation for an object within each frame, wherein the object-to-activity correlation is based, at least partially, on a classification probability of each frame and a co-occurrence of a plurality of objects with each activity, wherein an object is extracted from a frame via frame analysis using a convolutional neural network with tiled neurons; and
  
  render a video summary that comprises the frames with object-to-activity correlations above a predetermined threshold for each frame in a shot boundary.
- View Dependent Claims (21, 22)
- - 21. The computer readable medium of claim 20, wherein object to activity correlation is obtained by executing both an activity classifier and object detector on a set of training data, by human annotation, or by any combination thereof.
  - 22. The computer readable medium of claim 20, wherein an importance of an object for an activity is directly proportional to a value of the number of frames in an activity that contains the object divided by the total number of frames for that activity and is used to determine the object to activity correlation.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intel Corporation
Original Assignee
Intel Corporation
Inventors
Hwangbo, Myung, Singh, Krishna Kumar, Lee, Teahyung, Tickoo, Omesh
Primary Examiner(s)
Mehta, Bhavesh M
Assistant Examiner(s)
Lemieux, Ian L

Application Number

US14/998,322
Publication Number

US 20170185846A1
Time in Patent Office

1,174 Days
Field of Search
US Class Current
CPC Class Codes

G06F 18/2431   Multiple classes

G06N 3/045   Combinations of networks

G06N 3/08   Learning methods

G06V 10/40   Extraction of image or vide...

G06V 10/764   using classification, e.g. ...

G06V 10/82   using neural networks

G06V 20/41   Higher-level, semantic clus...

G06V 20/47   Detecting features for summ...

G06V 20/49   Segmenting video sequences,...

Video summarization using semantic information

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Video summarization using semantic information

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links