Intelligent TV system and method
First Claim
1. A method for an intelligent user-interaction system based on object detection, comprising:
- receiving an input video sequence corresponding to a video program;
dividing the input video sequence into a plurality of video shots, each containing one or more video frames;
detecting possible object occurrences in each of the plurality of video shots;
analyzing possible paths of an object in a video shot using a multimodal-cue approach;
aggregating the path-based selected object occurrences across the plurality of video shots to detect objects; and
generating a complete list of the object occurrences across the plurality of video shots;
wherein analyzing the possible paths using a multimodal-cue approach further includes;
combining an appearance cue, a spatio-temporal cue, and a topological cue to aid object detection in the plurality of video shots;
dictating a usage of an object'"'"'s visual features to detect possible object locations in a video frame using the appearance cue;
injecting information across a sequence of frames via relational constraints between a target object class and a related object class using the spatio-temporal cue and the topological cue;
fusing the multimodal cue information to create links between object occurrences across the video frames in a current video shot; and
applying dynamic programming to find optimal object paths.
1 Assignment
0 Petitions
Accused Products
Abstract
A method is provided for an intelligent user-interaction system based on object detection. The method includes receiving an input video sequence corresponding to a video program, and dividing the input video sequence into a plurality of video shots, each containing one or more video frames. The method also includes detecting possible object occurrences in each of the plurality of video shots, and analyzing possible paths of an object in a video shot using a multimodal-cue approach. Further, the method includes aggregating the path-based selected object occurrences across the plurality of video shots to detect objects, and generating a complete list of the object occurrences across the plurality of video shots.
-
Citations
15 Claims
-
1. A method for an intelligent user-interaction system based on object detection, comprising:
-
receiving an input video sequence corresponding to a video program; dividing the input video sequence into a plurality of video shots, each containing one or more video frames; detecting possible object occurrences in each of the plurality of video shots; analyzing possible paths of an object in a video shot using a multimodal-cue approach; aggregating the path-based selected object occurrences across the plurality of video shots to detect objects; and generating a complete list of the object occurrences across the plurality of video shots; wherein analyzing the possible paths using a multimodal-cue approach further includes; combining an appearance cue, a spatio-temporal cue, and a topological cue to aid object detection in the plurality of video shots; dictating a usage of an object'"'"'s visual features to detect possible object locations in a video frame using the appearance cue; injecting information across a sequence of frames via relational constraints between a target object class and a related object class using the spatio-temporal cue and the topological cue; fusing the multimodal cue information to create links between object occurrences across the video frames in a current video shot; and applying dynamic programming to find optimal object paths. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. An intelligent user-interaction system, comprising:
-
a video decoder configured to decode an incoming bit stream; a data storage configured to store a certain time of incoming bit-stream as an input video sequence corresponding to a video program to be shown to a user on a display; a preprocessing unit configured to divide the input video sequence into a plurality of video shots, each containing one or more video frames; a detection unit configured to detect possible object occurrences in each video shot; a path analysis unit configured to analyze possible paths of an object in a video shot using a multimodal-cue approach; and an aggregation unit configured to aggregate the path-based selected object occurrences across the plurality of video shots to detect objects; wherein the path analysis unit is further configured to; combine an appearance cue, a spatio-temporal cue, and a topological cue to aid object detection in the plurality of video shots; dictate a usage of an object'"'"'s visual features to detect possible object locations in a video frame based on the appearance cue; inject information across a sequence of frames via relational constraints between a target object class and a related object class using the spatio-temporal cue and the topological cue; fuse the multimodal cue information to create links between object occurrences across the video frames in a current video shot; and apply dynamic programming to find optimal object paths. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15)
-
Specification