System and method for automated multimedia content indexing and retrieval
First Claim
1. A method for processing a multimedia event, comprising:
- separating text components from a multimedia data stream associated with the multimedia event to yield separated text components;
generating a plurality of semantically coherent text blocks from the separated text components using an automated multimedia content indexing and retrieval system, wherein at least one semantically coherent text block is generated by merging disconnected text blocks;
identifying a target speaker based on audio features in the multimedia data stream to yield an identified target speaker;
deriving a topic for each text block of the plurality of semantically coherent text blocks based on a set of topic category models to yield derived topics; and
generating a multimedia description of the multimedia event based at least on the identified target speaker, the plurality of semantically coherent text blocks, and the derived topics, wherein the multimedia description comprises at least a timeline representation having a plurality of layers showing multiple categorizations of the multimedia data stream for each instance of time.
4 Assignments
0 Petitions
Accused Products
Abstract
The invention provides a system and method for automatically indexing and retrieving multimedia content. The method may include separating a multimedia data stream into audio, visual and text components, segmenting the audio, visual and text components based on semantic differences, identifying at least one target speaker using the audio and visual components, identifying a topic of the multimedia event using the segmented text and topic category models, generating a summary of the multimedia event based on the audio, visual and text components, the identified topic and the identified target speaker, and generating a multimedia description of the multimedia event based on the identified target speaker, the identified topic, and the generated summary.
56 Citations
13 Claims
-
1. A method for processing a multimedia event, comprising:
-
separating text components from a multimedia data stream associated with the multimedia event to yield separated text components; generating a plurality of semantically coherent text blocks from the separated text components using an automated multimedia content indexing and retrieval system, wherein at least one semantically coherent text block is generated by merging disconnected text blocks; identifying a target speaker based on audio features in the multimedia data stream to yield an identified target speaker; deriving a topic for each text block of the plurality of semantically coherent text blocks based on a set of topic category models to yield derived topics; and generating a multimedia description of the multimedia event based at least on the identified target speaker, the plurality of semantically coherent text blocks, and the derived topics, wherein the multimedia description comprises at least a timeline representation having a plurality of layers showing multiple categorizations of the multimedia data stream for each instance of time. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computing device that automatically indexes and retrieves a multimedia event, the computing comprising:
-
an automated multimedia content indexing and retrieval system; a first module, using the automated multimedia content indexing and retrieval system, configured to separate text components from a multimedia data stream associated with a multimedia event; a second module configured to generate a plurality of semantically coherent text blocks from the separated text components; a third module configured to identify a target speaker based on audio features in the multimedia data stream; a fourth module configured to derive a topic for each text block of the plurality of semantically coherent text blocks based on a set of topic category models, wherein at least one semantically coherent text block is generated by merging disconnected text blocks; and a fifth module configured to generate a multimedia description of the multimedia event based at least on the semantically coherent text blocks and the derived topics, wherein the multimedia description comprises at least a timeline representation having a plurality of layers showing multiple categorizations of the multimedia data stream for each instance of time. - View Dependent Claims (9, 10, 11, 12, 13)
-
Specification