Automated Video-To-Text System
First Claim
1. A method for converting video to text, comprising the steps of:
- receiving at least one frame of video;
partitioning the at least one frame of a video into a plurality of blobs;
providing a semantic class label for each blob;
constructing a graph from a plurality of the semantic class labels representing blobs at the vertices and a plurality of edges represent the spatial interactions between blobs; and
traversing the graph to generate text associated with the video.
2 Assignments
0 Petitions
Accused Products
Abstract
A method for transforming Video-To-Text is disclosed that automatically generates text descriptions of the content of a video. The present invention first segments an input video sequence according to predefined semantic classes using a Mixture-of-Experts blob segmentation algorithm. The resulting segmentation is coerced into a semantic concept graph and based on domain knowledge and a semantic concept hierarchy. Then, the initial semantic concept graph is summarized and pruned. Finally, according to the summarized semantic concept graph and its changes over time, text and/or speech descriptions are automatically generated using one of the three description schemes: key-frame, key-object and key-change descriptions.
43 Citations
24 Claims
-
1. A method for converting video to text, comprising the steps of:
-
receiving at least one frame of video;
partitioning the at least one frame of a video into a plurality of blobs;
providing a semantic class label for each blob;
constructing a graph from a plurality of the semantic class labels representing blobs at the vertices and a plurality of edges represent the spatial interactions between blobs; and
traversing the graph to generate text associated with the video. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20)
-
-
19. An apparatus for converting a video to a text description, comprising:
-
a receiver for receiving at least one frame of video;
a segmenter module for partitioning said at least one frame of video into a plurality of blobs and for providing a semantic class label for each of said blobs;
a tracking module for providing a global identifier for each of said blobs;
a summarization module for constructing a graph from a plurality of semantic class labels representing said blobs at the vertices and a plurality of edges representing the spatial interactions between said blobs; and
a description generation module for traversing the graph to generate text associated with the video. - View Dependent Claims (21, 22)
-
-
23. A computer-readable medium carrying one or more sequences of instructions for converting video to text, wherein execution of the one of more sequences of instructions by one or more processors causes the one or more processors to perform the steps of:
-
receiving at least one frame of video;
partitioning the at least one frame of a video into a plurality of blobs;
providing a semantic class label for each blob;
constructing a graph from a plurality of the semantic class labels representing blobs at the vertices and a plurality of edges represent the spatial interactions between blobs; and
traversing the graph to generate text associated with the video. - View Dependent Claims (24)
-
Specification