Capture-intention detection for video content analysis
First Claim
Patent Images
1. A computer-implemented method, comprising:
- on a video content analysis device;
delineating video data into intention units;
extracting features from the video data, wherein each feature is used to estimate one or more human intentions wherein the extracting features includes extracting attention-specific features, and wherein each attention-specific feature represents one dimension of human attention, and wherein the extracting attention-specific features includes analyzing four dimensions of attention (DoA);
an attention stability, an attention energy, an attention window, and a camera pattern;
classifying the intention units into intention categories; and
selecting a number of categories to be the intention categories and defining each of the intention categories according to a type of video content characteristic of one of the human intentions, wherein the intention categories include a static scene category, a dynamic event category, a close-un view category, a beautiful scenery category, a switch record category, a longtime record category, and a just record category.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods are described for detecting capture-intention in order to analyze video content. In one implementation, a system decomposes video structure into sub-shots, extracts intention-oriented features from the sub-shots, delineates intention units via the extracted features, and classifies the intention units into intention categories via the extracted features. A video library can be organized via the categorized intention units.
146 Citations
15 Claims
-
1. A computer-implemented method, comprising:
-
on a video content analysis device; delineating video data into intention units; extracting features from the video data, wherein each feature is used to estimate one or more human intentions wherein the extracting features includes extracting attention-specific features, and wherein each attention-specific feature represents one dimension of human attention, and wherein the extracting attention-specific features includes analyzing four dimensions of attention (DoA);
an attention stability, an attention energy, an attention window, and a camera pattern;classifying the intention units into intention categories; and selecting a number of categories to be the intention categories and defining each of the intention categories according to a type of video content characteristic of one of the human intentions, wherein the intention categories include a static scene category, a dynamic event category, a close-un view category, a beautiful scenery category, a switch record category, a longtime record category, and a just record category. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system, comprising:
-
a processing device to enable operation of one or more system components; a shot detector to determine temporal segments of video shots in video data; a sub-shot detector to determine temporal segments of sub-shots in the video shots; a feature analyzer to determine both attention-specific characteristics and content-generic characteristics for each of multiple features of each sub-shot, wherein the attention characteristic indicates a person'"'"'s attention degree on the scene or object to be captured or having been captured wherein the multiple features of a sub-shot include attention-specific features, the attention-specific features including; an attention stability, an attention energy, an attention window, and a camera pattern; an intention unit segmenter to delineate intention units composed of the sub-shots according to the attention characteristics of the features of the sub-shots; and an intention classifier to assign each intention unit to an intention category, such that the video data is capable of being organized by intention units, wherein the intention categories include a static scene category, a dynamic event category, a close-up view category, a beautiful scenery category, a switch record category, a longtime record category, and a just record category. - View Dependent Claims (13, 14)
-
-
15. A system, comprising:
-
a processing device to enable operation of one or more system components; means for delineating video data into intention units; means for extracting features from the video data, wherein each feature is used to estimate one or more of the human intentions wherein the extracting features includes extracting attention-specific features, and wherein each attention-specific feature represents one dimension of human attention, and wherein the extracting attention-specific features includes analyzing four dimensions of attention (DoA);
an attention stability, an attention energy, an attention window, and a camera pattern; andmeans for classifying the intention units into intention categories, wherein the intention categories include a static scene category, a dynamic event category, a close-up view category, a beautiful scenery category, a switch record category, a longtime record category, and a just record category.
-
Specification