Classification, search, and retrieval of complex video events
First Claim
Patent Images
1. A video search assistant embodied in one or more machine readable storage media and accessible by a computing system to assist a user with a video search by:
- receiving a user-specified search request;
determining a higher level complex event of interest, based on the user-specified search request;
accessing a video event model, the video event model comprising;
(i) a plurality of semantic elements associated with a plurality of higher level complex events depicted in the plurality of videos, each higher level complex event evidenced by at least two different lower level complex events, each of the semantic elements describing one or more of a scene, an action, an actor, and an object depicted in one or more of the videos, and (ii) data indicative of evidentiary relationships of different combinations of semantic elements forming the lower level complex events, wherein the video event model is derived by;
(i) executing one or more event classifiers on one or more dynamic low level features of the videos to identify one or more semantic elements associated with the dynamic low level features, (ii) computing a strength of association of the one or more semantic elements with ones of the lower level complex events, (iii) and computing a strength of association of the associated ones of the lower level complex events with the higher level complex event;
determining, based on the video event model, one or more semantic elements of interest associated with the higher level complex event of interest; and
formulating a search for one or more videos depicting the higher level complex event of interest, the search comprising one or more of the semantic elements of interest.
1 Assignment
0 Petitions
Accused Products
Abstract
A complex video event classification, search and retrieval system can generate a semantic representation of a video or of segments within the video, based on one or more complex events that are depicted in the video, without the need for manual tagging. The system can use the semantic representations to, among other things, provide enhanced video search and retrieval capabilities.
-
Citations
45 Claims
-
1. A video search assistant embodied in one or more machine readable storage media and accessible by a computing system to assist a user with a video search by:
-
receiving a user-specified search request; determining a higher level complex event of interest, based on the user-specified search request; accessing a video event model, the video event model comprising;
(i) a plurality of semantic elements associated with a plurality of higher level complex events depicted in the plurality of videos, each higher level complex event evidenced by at least two different lower level complex events, each of the semantic elements describing one or more of a scene, an action, an actor, and an object depicted in one or more of the videos, and (ii) data indicative of evidentiary relationships of different combinations of semantic elements forming the lower level complex events, wherein the video event model is derived by;
(i) executing one or more event classifiers on one or more dynamic low level features of the videos to identify one or more semantic elements associated with the dynamic low level features, (ii) computing a strength of association of the one or more semantic elements with ones of the lower level complex events, (iii) and computing a strength of association of the associated ones of the lower level complex events with the higher level complex event;determining, based on the video event model, one or more semantic elements of interest associated with the higher level complex event of interest; and formulating a search for one or more videos depicting the higher level complex event of interest, the search comprising one or more of the semantic elements of interest. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A video search assistant embodied in one or more machine readable storage media and accessible by a computing system to generate a description of a video to assist with a video search, by:
-
accessing a set of semantic elements associated with the video, each of the semantic elements describing one or more of a scene, an action, an actor, and an object depicted in the video, wherein at least a portion of the set is, in combination, indicative of at least one lower level complex event; recognizing a higher level complex event as being likely depicted in the video, as evidenced by a combination of the at least one lower level complex event; algorithmically generating a human-intelligible representation of the higher level complex event based on the semantic elements evidencing the higher level complex event, the human-intelligible representation comprising one or more of;
a natural language description, one or more non-textual visual elements representative of the natural language description, and one or more audio elements representative of the natural language description; andassociating the human-intelligible representation with the video. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
-
-
30. A video search assistant embodied in one or more machine readable storage media and accessible by a computing system to recognize a video as likely depicting a complex event, by, algorithmically:
-
identifying a plurality of visual and non-visual features included in the video; deriving a plurality of semantic elements from the visual and non-visual features, each of the semantic elements describing one or more of a scene, an action, an actor, and an object depicted in the video; deriving a plurality of lower level complex events comprising combinations of the semantic elements; recognizing a higher level complex event as being likely depicted in the video based on a combination of the lower level complex events; and associating the higher level complex event with the video. - View Dependent Claims (31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44)
-
-
45. A system for creating a highlight reel by algorithmically recognizing and extracting complex events from a video, the system comprising a plurality of instructions embodied in one or more non-transitory machine readable storage media, the instructions executable by a processor to cause an electronic device to:
-
detect a plurality of visual features in the video; identify a plurality of semantic elements associated with the detected visual features; select a subset of the semantic elements, the subset comprising semantic elements evidencing at least two different actions associated with a plurality of events; select a subset of the associated plurality of events indicating the complex event; extract, from the video, a plurality of images corresponding to the actions associated with the complex event; and by a display of the electronic device, present the extracted images as a highlight reel of the complex event.
-
Specification