System and method for automated multimedia content indexing and retrieval
First Claim
1. A method for automatically indexing and retrieving a multimedia event, comprising:
- separating a multimedia data stream into audio, visual and text components;
segmenting the audio, visual and text components of the multimedia data stream based on semantic differences, wherein frame-level features are extracted from the segmented audio component in a plurality of subbands;
identifying at least one target speaker using the audio and visual components;
identifying semantic boundaries of text for at least one of the identified target speakers to generate semantically coherent text blocks;
generating a summary of multimedia content based on the audio, visual and text components, the semantically coherent text blocks and the identified target speaker;
deriving a topic for each of the semantically coherent text blocks based on a set of topic category models; and
generating a multimedia description of the multimedia event based on the identified target speaker, the semantically coherent text blocks, the topic, and the generated summary.
5 Assignments
0 Petitions
Accused Products
Abstract
The invention provides a system and method for automatically indexing and retrieving multimedia content. The method may include separating a multimedia data stream into audio, visual and text components, segmenting the audio, visual and text components based on semantic differences, identifying at least one target speaker using the audio and visual components, identifying a topic of the multimedia event using the segmented text and topic category models, generating a summary of the multimedia event based on the audio, visual and text components, the identified topic and the identified target speaker, and generating a multimedia description of the multimedia event based on the identified target speaker, the identified topic, and the generated summary.
-
Citations
25 Claims
-
1. A method for automatically indexing and retrieving a multimedia event, comprising:
-
separating a multimedia data stream into audio, visual and text components; segmenting the audio, visual and text components of the multimedia data stream based on semantic differences, wherein frame-level features are extracted from the segmented audio component in a plurality of subbands; identifying at least one target speaker using the audio and visual components; identifying semantic boundaries of text for at least one of the identified target speakers to generate semantically coherent text blocks; generating a summary of multimedia content based on the audio, visual and text components, the semantically coherent text blocks and the identified target speaker; deriving a topic for each of the semantically coherent text blocks based on a set of topic category models; and generating a multimedia description of the multimedia event based on the identified target speaker, the semantically coherent text blocks, the topic, and the generated summary. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A system that automatically indexes and retrieves a multimedia event, comprising:
-
a multimedia data stream separation unit that separates a multimedia data stream into audio, visual and text components; a data stream component segmentation unit that segments the audio, visual and text components of the multimedia data stream based on semantic differences; a feature extraction unit that extracts audio features from the audio component and the audio features comprising a frame-level feature in a plurality of subbands; a target speaker detection unit that identifies at least one target speaker using the audio and visual components; a content segmentation unit that identifies semantic boundaries of text for at least one of the identified target speakers, to generate semantically coherent text blocks; a summary generator that generates a summary of multimedia content based on the audio, visual and text components, the semantically coherent text blocks and the identified target speaker; a topic categorization unit that derives a topic for each of the semantically coherent text blocks based on a set of topic category models; and a multimedia description generator that generates a multimedia description of the multimedia event based on the identified target speaker, the semantically coherent text blocks, the topic and the generated summary. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
Specification