Method for automatic extraction of semantically significant events from video
First Claim
1. A computerized method of inferring an event portrayed in a video sequence having semantic content that belongs to at least one known domain, said video sequence having one or more intervals each containing a plurality of sequential frames, said method comprising the steps of:
- (a) a computer analyzing the content of said video over at least one interval of said one or more intervals;
(b) a computer summarizing said analysis over said at least one interval; and
(c) a computer inferring from said summary whether or not said event is portrayed in said video sequence based on criteria associated with said semantic content.
1 Assignment
0 Petitions
Accused Products
Abstract
Semantically meaningful events are detected in video with a multi-level technique. Video sequences are visually analyzed at the first level of the technique to detect shot boundaries, to measure color and texture of the content, and to detect objects in the content. At the second level of the technique, objects are classified and the content in each shot is summarized. At the third level of the technique, an event inference module infers events on the basis of temporal and spatial phenomena disclosed in the shot summaries. The technique may be extended to additional domains by incorporating additional domain related processes at the upper levels of the technique which utilize data produced at the first level of the technique by processes which are domain independent.
125 Citations
15 Claims
-
1. A computerized method of inferring an event portrayed in a video sequence having semantic content that belongs to at least one known domain, said video sequence having one or more intervals each containing a plurality of sequential frames, said method comprising the steps of:
-
(a) a computer analyzing the content of said video over at least one interval of said one or more intervals;
(b) a computer summarizing said analysis over said at least one interval; and
(c) a computer inferring from said summary whether or not said event is portrayed in said video sequence based on criteria associated with said semantic content. - View Dependent Claims (2, 3, 4, 5, 6, 7)
(a) segmenting said video sequence into said one or more intervals where each of said intervals contains relatively homogeneous content among said plurality of sequential frames within said interval;
(b) detecting an object in said content; and
(c) measuring at least one of a color and a texture of said content.
-
-
3. The method of claim 2 wherein a color histogram evaluates the content of each of said plurality of sequential frames within said one or more intervals.
-
4. The method of claim 2 wherein said object is detected by comparing said content of a first frame with said content of a second frame.
-
5. The method of claim 3 further comprising the step of adjusting the content of said plurality of sequential frames within at least one interval of said one or more intervals for a global motion of said content within said at least one interval.
-
6. The method of claim 1 wherein said step of summarizing said analysis comprises characterizing said content with at least one of a spatial descriptor, a temporal descriptor and an object descriptor.
-
7. The method of claim 1 wherein said event is inferred from at least one of a spatial descriptor, a temporal descriptor and an object descriptor in said summary.
-
8. A computerized method of inferring an event portrayed in a video sequence having semantic content that belongs to at least one known domain, said method comprising the steps of:
-
(a) a computer decomposing said video into at least one plurality of frames having relatively homogenous content;
(b) a computer detecting an object in said content;
(c) a computer classifying said object;
(d) a computer characterizing said content of said at least one plurality of frames by at least one of a spatial descriptor, a temporal descriptor, and an object descriptor; and
(e) a computer inferring from said characterization of said content whether or not said event is portrayed in said video sequence based on criteria associated with said semantic content. - View Dependent Claims (9, 10, 11, 12)
-
-
13. A computerized method of inferring an event portrayed in a video sequence having semantic content that belongs to at least one known domain, said video sequence having one or more intervals each containing a plurality of sequential frames, said method comprising the steps of:
-
(a) a computer decomposing said video into said one or more intervals where each of said intervals has relatively homogenous content;
(b) a computer detecting an object moving independent of a global motion of said content;
(c) a computer measuring the position of said object in an initial frame and a subsequent frame of said one or more intervals;
(d) a computer measuring the size of said object;
(e) a computer measuring at least one of a color and a texture of said content;
(f) a computer classifying said object from at least one of said at least one of a color and texture measure, said position measure, and said size measure;
(g) a computer summarizing said content over said one or more intervals by characterizing at least one of said classification of said object in said initial frame and said subsequent frame; and
(h) a computer inferring from said summary whether or not said event is portrayed in said video sequence based on criteria associated with said semantic content. - View Dependent Claims (14, 15)
-
Specification