Apparatus and software system for and method of performing a visual-relevance-rank subsequent search
First Claim
1. A computer implemented method of identifying video content and displaying representative frames of said video content, comprising the steps of:
- extracting frames from the video content, the extracting identifying for the video content, sequences of frames of each video comprising scenes and determining representative frames for the scenes;
identifying one or more visual objects appearing in each of the scenes that correspond to a target object specified by search criteria;
extracting the visual objects from the scenes;
extracting contiguous regions of interest containing detected and tracked visual objects within each scene by identifying contiguous regions of interest within each scene each region of interest comprising a region within a video'"'"'s frames containing a tracked object;
grouping similar regions of interest together into tubes;
grouping similar non-contiguous tubes;
identifying representative frames in each contiguous region of interest; and
determining, for the video content, a visual relevance rank of each of the scenes, wherein the step of determining the visual relevance rank of each of the scenes comprisesscoring each of the representative frames by assigning an importance level to each of the representative frames and based on a group size of groups of tubes,determining, for each of the video content, whether to display the representative frame within each of selected scenes based on the visual relevance rank associated with each of the scenes; and
displaying at least the representative frame based on the visual relevance rank for each of the video content.
3 Assignments
0 Petitions
Accused Products
Abstract
A method analyzes the visual content of media such as videos for collecting together visually-similar appearances in their constituent images (e.g. same scenes, same objects, faces of the same people.) As a result, the most relevant and salient (of clearest and largest presence) visual appearances depicted in the videos are presented to the user, both for the sake of summarizing the video content for the users to “see before they watch” (that is, judge by the depicted video content in a filmstrip-like summary whether they want to mouse-click on the video and actually spend time watching it), as well as for allowing to users to further refine their video search result set according to the most relevant and salient video content returned (e.g. largest screen-time faces).
122 Citations
19 Claims
-
1. A computer implemented method of identifying video content and displaying representative frames of said video content, comprising the steps of:
-
extracting frames from the video content, the extracting identifying for the video content, sequences of frames of each video comprising scenes and determining representative frames for the scenes; identifying one or more visual objects appearing in each of the scenes that correspond to a target object specified by search criteria; extracting the visual objects from the scenes; extracting contiguous regions of interest containing detected and tracked visual objects within each scene by identifying contiguous regions of interest within each scene each region of interest comprising a region within a video'"'"'s frames containing a tracked object; grouping similar regions of interest together into tubes; grouping similar non-contiguous tubes; identifying representative frames in each contiguous region of interest; and determining, for the video content, a visual relevance rank of each of the scenes, wherein the step of determining the visual relevance rank of each of the scenes comprises scoring each of the representative frames by assigning an importance level to each of the representative frames and based on a group size of groups of tubes, determining, for each of the video content, whether to display the representative frame within each of selected scenes based on the visual relevance rank associated with each of the scenes; and displaying at least the representative frame based on the visual relevance rank for each of the video content. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. An apparatus for identifying and displaying selected video content comprising:
-
a scene detection engine operable to identify, for each of multiple videos, sequences of frames of each video as comprising respective scenes, the scene detection engine operable to identify one or more visual objects appearing in each of the scenes that correspond to a target object specified by search criteria; a scoring engine operable to determine, for each of the video content, a visual relevance rank of each of the scenes by; extracting contiguous regions of interest containing detected and tracked visual objects within each scene by identifying contiguous regions of interest within each scene; grouping similar regions of interest together into tubes; grouping similar non-contiguous tubes; identifying representative frames in each contiguous region of interest; scoring each of the scenes by assigning an importance level to each of the representative frames, and based at least on a length of grouped similar tubes containing the detected and tracked object; and a scene selection engine operable to select, for each of the video content, a number of the scenes based on the visual relevance rank associated with each of the scenes; and a frame extraction engine operable to identify, for each of the video content, a representative frame for each of the scenes; and a display coupled to a display engine adapted to display at least the representative frame for each of the video content. - View Dependent Claims (11, 12, 13, 14)
-
-
15. A computer storage device having computer readable program code embodied therein, the computer readable program code causing a processing device to perform steps comprising:
-
receiving a search query specifying search criteria; searching for videos satisfying the search criteria; identifying one or more visual objects that correspond to a target object specified by search criteria in each video of video search results returned by the search query by; identifying, for each of video search results, sequences of frames of each video as comprising respective scenes; extracting contiguous regions of interest containing detected and tracked objects within each scene by identifying contiguous regions of interest within each scene; determining, for each of the video search results, a visual relevance rank of each of the scenes, wherein the step of determining the visual relevance rank of each of the scenes comprises; grouping similar regions of interest together into tubes; grouping similar non-contiguous tubes; identifying representative frames in each contiguous region of interest; scoring each of the representative frames by assigning an importance level to each of the frames; and grouping the visual objects present in multiple of the videos of the video search results; scoring each of the scenes by assigning an importance level to each of the representative frames, and based at least on grouped similar tubes containing the detected and tracked object; and displaying an image for each visual object grouping along with the video search results. - View Dependent Claims (16, 17, 18, 19)
-
Specification