System and method for relevance estimation in summarization of videos of multi-step activities

US 9,977,968 B2
Filed: 03/04/2016
Issued: 05/22/2018
Est. Priority Date: 03/04/2016
Status: Active Grant

First Claim

Patent Images

1. A computer implemented method for identifying content relevance in a video stream, said method comprising:

acquiring at a computer, video data from a video camera;

mapping extracted features of said acquired video data to a feature space to obtain a feature representation of said video data;

assigning said acquired video data, with a classifier, to at least one action class based on said feature representation of said video data, said classifier comprising at least one of a support vector machine, a neural network, a decision tree, an expectation-maximization algorithm, and a k-nearest neighbor clustering algorithm; and

determining a relevance of said acquired video data based on said at least one action class assigned, wherein determining a relevance of said acquired video data based on said at least one action class assigned comprises;

assigning said acquired video data a classification confidence score; and

converting said classification confidence score to a relevance score.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system for identifying content relevance comprises acquiring video data, mapping the acquired video data to a feature space to obtain a feature representation of the video data, assigning the acquired video data to at least one action class based on the feature representation of the video data, and determining a relevance of the acquired video data.

16 Citations

16 Claims

1. A computer implemented method for identifying content relevance in a video stream, said method comprising:
- acquiring at a computer, video data from a video camera;
  
  mapping extracted features of said acquired video data to a feature space to obtain a feature representation of said video data;
  
  assigning said acquired video data, with a classifier, to at least one action class based on said feature representation of said video data, said classifier comprising at least one of a support vector machine, a neural network, a decision tree, an expectation-maximization algorithm, and a k-nearest neighbor clustering algorithm; and
  
  determining a relevance of said acquired video data based on said at least one action class assigned, wherein determining a relevance of said acquired video data based on said at least one action class assigned comprises;
  
  assigning said acquired video data a classification confidence score; and
  
  converting said classification confidence score to a relevance score.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1 wherein said video data comprises one of:
    - video acquired with an egocentric or wearable device;
      
      video acquired with a vehicle-mounted device; and
      
      surveillance or third-person view video.
  - 3. The method of claim 1 wherein determining a relevance of said acquired video data based on said classifier output further comprises:
    - enforcing a temporal smoothness requirement on at least one relevance score.
  - 4. The method of claim 1 wherein said extracted features comprise at least one of:
    - deep features; and
      
      hand-engineered features.
  - 5. The method of claim 1 wherein said classifier comprises a support vector machine described by parameters w and b, and wherein a magnitude of a classification score |w·
    - xj+b| for an input xj is used to estimate said relevance of said acquired video data.
  - 6. The method of claim 1 wherein said classifier comprises a neural network wherein estimating said relevance of said acquired video data further comprises estimating a relevance of an input sample xj based on outputs zk where 1≤
    - k≤
      
      K, and K is a number of classes.
  - 7. The method of claim 1 further comprising an offline training stage comprising training said classifier to optimally discriminate between a plurality of different action classes according to their corresponding feature representations.

8. A system for identifying content relevance, said system comprising:
- a video acquisition module comprising a video camera for acquiring video data;
  
  a processor;
  
  a data bus coupled to said processor; and
  
  a computer-usable medium embodying computer program code, said computer-usable medium being coupled to said data bus, said computer program code comprising instructions executable by said processor and configured for;
  
  mapping extracted features of said acquired video data to a feature space to obtain a feature representation of said video data;
  
  assigning said acquired video data, via the use of a classifier, to at least one action class based on said feature representation of said video data, said classifier comprising at least one of a support vector machine, a neural network, a decision tree, an expectation-maximization algorithm, and a k-nearest neighbor clustering algorithm; and
  
  determining a relevance of said acquired video data based on said at least one action class assigned, wherein determining a relevance of said acquired video data based on said at least one action class assigned comprises;
  
  assigning said acquired video data a classification confidence score; and
  
  converting said classification confidence score to a relevance score.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8 wherein said video data comprises at least one frame of:
    - video acquired with an egocentric or wearable device;
      
      video acquired with a vehicle-mounted device;
      
      orsurveillance or third-person view video.
  - 10. The system of claim 8 wherein determining a relevance of said acquired video data based on said classifier output further comprises enforcing a temporal smoothness requirement on at least one relevance score.
  - 11. The system of claim 8 wherein said extracted features comprises at least one of:
    - deep features; and
      
      hand-engineered features.
  - 12. The system of claim 8 wherein said classifier comprises a support vector machine described by parameters w and b, and wherein a magnitude of a classification score |w·
    - xj+b| for an input xj is used to estimate said relevance of said acquired video data.
  - 13. The system of claim 8 wherein said classifier comprises a neural network wherein estimating said relevance of said acquired video data further comprises estimating a relevance of an input sample xj based on outputs zk where 1≤
    - k≤
      
      K, and K is a number of classes.
  - 14. The system of claim 8 wherein said computer program code comprising instructions executable by said processor further configured for an offline training stage comprising training said classifier to optimally discriminate between a plurality of different action classes according to their corresponding feature representations.

15. A non-transitory processor-readable medium storing computer code representing instructions to cause a process for identifying content relevance, said computer code comprising code to:
- train a classifier to optimally discriminate between a plurality of different action classes according to said feature representations, said classifier comprising at least one of a support vector machine, a neural network, a decision tree, an expectation-maximization algorithm, and a k-nearest neighbor clustering algorithm; and
  
  in an online stage;
  
  acquire video data said video data comprising one of video acquired with an egocentric or wearable device;
  
  video acquired with a vehicle-mounted device; and
  
  surveillance or third-person view video;
  
  segment said video data into at least one of a series of single frames and a series of groups of frames;
  
  map extracted features of said acquired video data to a feature space to obtain a feature representation of said video data;
  
  assign said acquired video data, via the use of a classifier, to at least one action class based on said feature representation of said video data; and
  
  assign said acquired video data a classification confidence score and convert said classification confidence score to a relevance score to determine a relevance of said acquired video data based on the at least one action class assigned.
- View Dependent Claims (16)
- - 16. The processor-readable medium of claim 15 wherein said extracted features comprise at least one of:
    - deep features, wherein said deep features are learned via the use of at least one of;
      
      a long-short term memory network;
      
      a convolutional network;
      
      an autoencoder; and
      
      a deep Boltzmann machine; and
      
      hand-engineered features, wherein said hand-engineered features comprise at least one of;
      
      scale-invariant features;
      
      interest point and descriptors thereof;
      
      dense trajectories;
      
      histogram of oriented gradients; and
      
      local binary patterns.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Xerox Corporation (Xerox Holdings Corp.)
Original Assignee
Xerox Corporation (Xerox Holdings Corp.)
Inventors
Bernal, Edgar A., Li, Qun, Zhang, Yun, Kumar, Jayant, Bala, Raja
Primary Examiner(s)
Carter, Aaron W

Application Number

US15/061,463
Publication Number

US 20170255831A1
Time in Patent Office

809 Days
Field of Search

382156, 382159, 382224, 1 1, 345419, 345502, 345520, 348 46, 348E13074, 712228
US Class Current
CPC Class Codes

G06F 18/2411   based on the proximity to a...

G06F 18/285   Selection of pattern recogn...

G06V 10/82   using neural networks

G06V 20/41   Higher-level, semantic clus...

G06V 20/47   Detecting features for summ...

G06V 20/49   Segmenting video sequences,...

H04L 65/612   for unicast

H04L 65/765   intermediate

H04L 67/01   Protocols

H04L 67/04   specially adapted for termi...

H04L 69/16   Implementation or adaptatio...

System and method for relevance estimation in summarization of videos of multi-step activities

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

16 Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for relevance estimation in summarization of videos of multi-step activities

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

16 Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links