SYSTEM AND METHOD FOR DEEP ANNOTATION AND SEMANTIC INDEXING OF VIDEOS
First Claim
1. A method of a deep annotation and a semantic indexing of a multimedia content based on a script, wherein said script is associated with said multimedia content, said method comprising:
- determining of a plurality of multimedia scenes of said multimedia content;
determining of a plurality of script segments of said script;
obtaining of a script segment structure associated with a script segment of said plurality of script segments, wherein said script segment structure comprises;
a plurality of objects, a plurality of object descriptions of said plurality of objects, a plurality of persons, a plurality of person descriptions of said plurality of persons, a plurality of locations, a plurality of location descriptions of said plurality of locations, a plurality of scene descriptions, a plurality of dialog descriptions, a plurality of action descriptions, and a plurality of directives;
determining of a plurality of closed-world key phrases based on said script;
determining of a coarse-grained annotation associated with a script segment of said plurality of script segments based on the analysis of a plurality of objects, a plurality of object descriptions of said plurality of objects, a plurality of persons, a plurality of person descriptions of said plurality of persons, a plurality of locations, a plurality of location descriptions of said plurality of locations, a plurality of scene descriptions, a plurality of dialog descriptions, a plurality of action descriptions, and a plurality of directives associated with said script segment;
determining of a plurality of coarse-grained annotations associated with a plurality of multimedia key frames of a multimedia scene of said plurality of multimedia scenes based on said plurality of closed-world key phrases;
determining of a plurality of plurality of matched script segments associated with said plurality of multimedia key frames based on said plurality of script segments and said plurality of coarse-grained annotations;
determining of a best matched script segment associated with said multimedia scene based on said plurality of plurality of matched script segments;
analyzing of said best matched script segment to result in a fine-grained annotation of said multimedia scene;
making of said fine-grained annotation a part of said deep annotation of said multimedia content;
performing of said semantic indexing of said multimedia content based on a fine-grained annotation associated with each of said plurality of multimedia scenes of said multimedia content; and
determining of a plurality of homogeneous scenes of said plurality of multimedia scenes based on said semantic indexing.
1 Assignment
0 Petitions
Accused Products
Abstract
Video on demand services rely on frequent viewing and downloading of content to enhance the return on investment on such services. Videos in general and movies in particular hosted by video portals need to have extensive annotations to help in greater monetization of content. Such deep annotations help in creating content packages based on bits and pieces extracted from specific videos suited to individuals'"'"' queries thereby providing multiple opportunities for piece-wise monetization. Considering the complexity involved in extracting deep semantics for deep annotation based on video and audio analyses, a system and method for deep annotation uses video/movie scripts associated with content for supporting video-audio analysis in deep annotation.
35 Citations
9 Claims
-
1. A method of a deep annotation and a semantic indexing of a multimedia content based on a script, wherein said script is associated with said multimedia content, said method comprising:
-
determining of a plurality of multimedia scenes of said multimedia content; determining of a plurality of script segments of said script; obtaining of a script segment structure associated with a script segment of said plurality of script segments, wherein said script segment structure comprises;
a plurality of objects, a plurality of object descriptions of said plurality of objects, a plurality of persons, a plurality of person descriptions of said plurality of persons, a plurality of locations, a plurality of location descriptions of said plurality of locations, a plurality of scene descriptions, a plurality of dialog descriptions, a plurality of action descriptions, and a plurality of directives;determining of a plurality of closed-world key phrases based on said script; determining of a coarse-grained annotation associated with a script segment of said plurality of script segments based on the analysis of a plurality of objects, a plurality of object descriptions of said plurality of objects, a plurality of persons, a plurality of person descriptions of said plurality of persons, a plurality of locations, a plurality of location descriptions of said plurality of locations, a plurality of scene descriptions, a plurality of dialog descriptions, a plurality of action descriptions, and a plurality of directives associated with said script segment; determining of a plurality of coarse-grained annotations associated with a plurality of multimedia key frames of a multimedia scene of said plurality of multimedia scenes based on said plurality of closed-world key phrases; determining of a plurality of plurality of matched script segments associated with said plurality of multimedia key frames based on said plurality of script segments and said plurality of coarse-grained annotations; determining of a best matched script segment associated with said multimedia scene based on said plurality of plurality of matched script segments; analyzing of said best matched script segment to result in a fine-grained annotation of said multimedia scene; making of said fine-grained annotation a part of said deep annotation of said multimedia content; performing of said semantic indexing of said multimedia content based on a fine-grained annotation associated with each of said plurality of multimedia scenes of said multimedia content; and determining of a plurality of homogeneous scenes of said plurality of multimedia scenes based on said semantic indexing. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
Specification