Method for detecting scene boundaries in genre independent videos

US 7,756,338 B2
Filed: 02/14/2007
Issued: 07/13/2010
Est. Priority Date: 02/14/2007
Status: Expired due to Fees

First Claim

Patent Images

1. A computer implemented method for detecting scene boundaries in videos, comprising the steps of:

extracting audio features from an audio signal of the videos, wherein the audio features are Mel-frequency cepstral coefficients, and the audio signals are classified into semantic classes, and wherein each feature vector includes variable x₁, x₂, x₃indicating a number of the audio classes labels within a time window of duration [t−

W_L, t], where W_Lis about fourteen seconds, and variables x₄, x₅, x₆indicating a number of the audio classes in a window of duration $[t - \frac{W_{L}}{2}, t + \frac{W_{L}}{2}],$ and variables x₇, x₈, x₉indicate a number of audio classes within a window [t, t+W_L], and variables x₁₀, x₁₁are a Bhattacharyya shape and a Mahalanobis distance between the MFCC coefficients for the window [t−

W_L, t] and window [t, t+W_L], respectively, and variable x₁₂is twice an average number of shot boundaries present in the video within a window [t−

W_L, t+W_L];

extracting visual features from frames of the videos;

combining the audio and visual features into the feature vectors,extracting the feature vectors from videos of different genres;

classifying the feature vectors as scene boundaries using a support vector machine, and in which the support vector machine is trained to be independent of the different genre of the videos; and

segmenting the videos according to the scene boundaries, wherein the steps are performed in a computer.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer implemented method detects scene boundaries in videos by first extracting feature vectors from videos of different genres. The feature vectors are then classified as scene boundaries using a support vector machine. The support vector machine is trained to be independent of the different genres of the videos.

16 Citations

View as Search Results

10 Claims

1. A computer implemented method for detecting scene boundaries in videos, comprising the steps of:
- extracting audio features from an audio signal of the videos, wherein the audio features are Mel-frequency cepstral coefficients, and the audio signals are classified into semantic classes, and wherein each feature vector includes variable x₁, x₂, x₃indicating a number of the audio classes labels within a time window of duration [t−
  
  W_L, t], where W_Lis about fourteen seconds, and variables x₄, x₅, x₆indicating a number of the audio classes in a window of duration $[t - \frac{W_{L}}{2}, t + \frac{W_{L}}{2}],$ and variables x₇, x₈, x₉indicate a number of audio classes within a window [t, t+W_L], and variables x₁₀, x₁₁are a Bhattacharyya shape and a Mahalanobis distance between the MFCC coefficients for the window [t−
  
  W_L, t] and window [t, t+W_L], respectively, and variable x₁₂is twice an average number of shot boundaries present in the video within a window [t−
  
  W_L, t+W_L];
  
  extracting visual features from frames of the videos;
  
  combining the audio and visual features into the feature vectors,extracting the feature vectors from videos of different genres;
  
  classifying the feature vectors as scene boundaries using a support vector machine, and in which the support vector machine is trained to be independent of the different genre of the videos; and
  
  segmenting the videos according to the scene boundaries, wherein the steps are performed in a computer.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, in which the videos are compressed.
  - 3. The method of claim 1, further comprising:
    - feeding back results of the classifying to improve the feature extracting.
  - 4. The method of claim 1, further comprising:
    - labeling the scene boundaries in training videos to train the support vector machine.
  - 5. The method of claim 1, in which the semantic classes include music, speech, laughter, and silence.
  - 6. The method of claim 5, in which the speech includes male and female speech.
  - 7. The method of claim 1, in which the visual features include shot boundaries.
  - 8. The method of claim 1, in which the Bhattacharyya shape is
9. The method of claim 1, further comprising:
- transforming the feature vectors to a higher dimensional feature space using a kernel function.
10. The method of claim 9, in which the kernel function is a radial basis kernel.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Mitsubishi Electric Research Laboratories, Inc. (Mitsubishi Electric Corporation)
Original Assignee
Mitsubishi Electric Research Laboratories, Inc. (Mitsubishi Electric Corporation)
Inventors
Otsuka, Isao, Goela, Naveen, Niu, Feng, Divakaran, Ajay, Wilson, Kevin W.
Primary Examiner(s)
Strege; John B

Application Number

US11/674,750
Publication Number

US 20080193017A1
Time in Patent Office

1,245 Days
Field of Search

382/190, 382/199, 382/224, 382/225, 382/226, 382/227, 382/228, 382/236, 348/135, 348/699, 348/700, 345/728, 715/723, 725/45
US Class Current

382/190
CPC Class Codes

G06N 20/00   Machine learning

G06N 20/10   using kernel methods, e.g. ...

G06T 7/20   Analysis of motion motion e...

Method for detecting scene boundaries in genre independent videos

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

16 Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

Method for detecting scene boundaries in genre independent videos

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

16 Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links