Capture-intention detection for video content analysis

US 7,773,813 B2
Filed: 10/31/2005
Issued: 08/10/2010
Est. Priority Date: 10/31/2005
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method, comprising:

on a video content analysis device;

delineating video data into intention units;

extracting features from the video data, wherein each feature is used to estimate one or more human intentions wherein the extracting features includes extracting attention-specific features, and wherein each attention-specific feature represents one dimension of human attention, and wherein the extracting attention-specific features includes analyzing four dimensions of attention (DoA);

an attention stability, an attention energy, an attention window, and a camera pattern;

classifying the intention units into intention categories; and

selecting a number of categories to be the intention categories and defining each of the intention categories according to a type of video content characteristic of one of the human intentions, wherein the intention categories include a static scene category, a dynamic event category, a close-un view category, a beautiful scenery category, a switch record category, a longtime record category, and a just record category.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods are described for detecting capture-intention in order to analyze video content. In one implementation, a system decomposes video structure into sub-shots, extracts intention-oriented features from the sub-shots, delineates intention units via the extracted features, and classifies the intention units into intention categories via the extracted features. A video library can be organized via the categorized intention units.

146 Citations

15 Claims

1. A computer-implemented method, comprising:
- on a video content analysis device;
  
  delineating video data into intention units;
  
  extracting features from the video data, wherein each feature is used to estimate one or more human intentions wherein the extracting features includes extracting attention-specific features, and wherein each attention-specific feature represents one dimension of human attention, and wherein the extracting attention-specific features includes analyzing four dimensions of attention (DoA);
  
  an attention stability, an attention energy, an attention window, and a camera pattern;
  
  classifying the intention units into intention categories; and
  
  selecting a number of categories to be the intention categories and defining each of the intention categories according to a type of video content characteristic of one of the human intentions, wherein the intention categories include a static scene category, a dynamic event category, a close-un view category, a beautiful scenery category, a switch record category, a longtime record category, and a just record category.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The computer-implemented method as recited in claim 1, wherein each of the intention categories represents one of multiple human intentions for capturing images with a camera to create the video data.
  - 3. The computer-implemented method as recited in claim 1, wherein the delineating video data into intention units further includes decomposing a video structure into sub-shots based on camera motion thresholds.
  - 4. The computer-implemented method as recited in claim 1, wherein the extracting features includes extracting content generic features, wherein each content generic feature comprises a low-level visual feature that reflects a capture intention by strengthening a relationship between the capture-intention and low-level features.
  - 5. The computer-implemented method as recited in claim 1, wherein each different human intention corresponding to one of the intention categories is capable of being indicated in the video content by intention-oriented features.
  - 6. The computer-implemented method as recited in claim 1, wherein the delineating video data into intention units includes dividing the video data into temporal sub-shots, wherein each sub-shot comprises a camera motion, and each intention unit comprises one or more sub-shots assignable to the same one or more intention categories.
  - 7. The computer-implemented method as recited in claim 6, wherein the delineating the video data into the intention units includes comparing both attention-specific features and content generic features of the contiguous sub-shots, this comparing performed to determine the number of sub-shots to be included in a given intention unit.
  - 8. The computer-implemented method as recited in claim 7, further comprising classifying each intention unit as belonging to one or more of the intention categories based on the attention-specific features and the content generic features.
  - 9. The computer-implemented method as recited in claim 1, wherein the classifying each intention unit to one or more intention categories includes a learning-based classification of the intention units.
  - 10. The computer-implemented method as recited in claim 9, wherein the learning-based classification includes applying one of a support vector machine (SVM) classification schema or a Boosting classification schema.
  - 11. The computer-implemented method as recited in claim 1, further comprising organizing a video recording or multiple video recordings in a video library according to the classification of the intention units into the intention categories.

12. A system, comprising:
- a processing device to enable operation of one or more system components;
  
  a shot detector to determine temporal segments of video shots in video data;
  
  a sub-shot detector to determine temporal segments of sub-shots in the video shots;
  
  a feature analyzer to determine both attention-specific characteristics and content-generic characteristics for each of multiple features of each sub-shot, wherein the attention characteristic indicates a person'"'"'s attention degree on the scene or object to be captured or having been captured wherein the multiple features of a sub-shot include attention-specific features, the attention-specific features including;
  
  an attention stability,an attention energy,an attention window,and a camera pattern;
  
  an intention unit segmenter to delineate intention units composed of the sub-shots according to the attention characteristics of the features of the sub-shots; and
  
  an intention classifier to assign each intention unit to an intention category, such that the video data is capable of being organized by intention units, wherein the intention categories include a static scene category, a dynamic event category, a close-up view category, a beautiful scenery category, a switch record category, a longtime record category, and a just record category.
- View Dependent Claims (13, 14)
- - 13. The system as recited in claim 12, wherein the multiple features of a sub-shot further includes content generic features.
  - 14. The system as recited in claim 12, wherein the intention classifier includes a learning engine to train the classification of intention units into intention categories by applying one of a support vector machine (SVM) classification schema or a Boosting classification schema.

15. A system, comprising:
- a processing device to enable operation of one or more system components;
  
  means for delineating video data into intention units;
  
  means for extracting features from the video data, wherein each feature is used to estimate one or more of the human intentions wherein the extracting features includes extracting attention-specific features, and wherein each attention-specific feature represents one dimension of human attention, and wherein the extracting attention-specific features includes analyzing four dimensions of attention (DoA);
  
  an attention stability, an attention energy, an attention window, and a camera pattern; and
  
  means for classifying the intention units into intention categories, wherein the intention categories include a static scene category, a dynamic event category, a close-up view category, a beautiful scenery category, a switch record category, a longtime record category, and a just record category.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Mei, Tao, Hua, Xian-Sheng, Li, Shipeng
Primary Examiner(s)
Bella; Matthew C
Assistant Examiner(s)
Rush; Eric

Application Number

US11/263,081
Publication Number

US 20070101269A1
Time in Patent Office

1,744 Days
Field of Search

382/100, 382/155, 382/156, 382224-228, 348/231.2, 715/723
US Class Current

382/224
CPC Class Codes

G06F 16/786   using motion, e.g. object m...

G06V 20/40   in video content extracting...

G11B 27/28   by using information signal...

H04N 21/4147   PVR [Personal Video Recorde...

H04N 21/4223   Cameras H04N23/00 takes pre...

H04N 21/4334   Recording operations record...

H04N 21/44008   involving operations for an...

H04N 21/84   Generation or processing of...

H04N 21/8456   by decomposing the content ...

H04N 21/854   Content authoring

Capture-intention detection for video content analysis

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

146 Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Capture-intention detection for video content analysis

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

146 Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links