System and method for automated multimedia content indexing and retrieval

US 8,131,552 B1
Filed: 01/17/2007
Issued: 03/06/2012
Est. Priority Date: 11/21/2000
Status: Expired due to Fees

First Claim

Patent Images

1. A method for processing a multimedia event, comprising:

separating text components from a multimedia data stream associated with the multimedia event to yield separated text components;

generating a plurality of semantically coherent text blocks from the separated text components using an automated multimedia content indexing and retrieval system, wherein at least one semantically coherent text block is generated by merging disconnected text blocks;

identifying a target speaker based on audio features in the multimedia data stream to yield an identified target speaker;

deriving a topic for each text block of the plurality of semantically coherent text blocks based on a set of topic category models to yield derived topics; and

generating a multimedia description of the multimedia event based at least on the identified target speaker, the plurality of semantically coherent text blocks, and the derived topics, wherein the multimedia description comprises at least a timeline representation having a plurality of layers showing multiple categorizations of the multimedia data stream for each instance of time.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention provides a system and method for automatically indexing and retrieving multimedia content. The method may include separating a multimedia data stream into audio, visual and text components, segmenting the audio, visual and text components based on semantic differences, identifying at least one target speaker using the audio and visual components, identifying a topic of the multimedia event using the segmented text and topic category models, generating a summary of the multimedia event based on the audio, visual and text components, the identified topic and the identified target speaker, and generating a multimedia description of the multimedia event based on the identified target speaker, the identified topic, and the generated summary.

56 Citations

View as Search Results

13 Claims

1. A method for processing a multimedia event, comprising:
- separating text components from a multimedia data stream associated with the multimedia event to yield separated text components;
  
  generating a plurality of semantically coherent text blocks from the separated text components using an automated multimedia content indexing and retrieval system, wherein at least one semantically coherent text block is generated by merging disconnected text blocks;
  
  identifying a target speaker based on audio features in the multimedia data stream to yield an identified target speaker;
  
  deriving a topic for each text block of the plurality of semantically coherent text blocks based on a set of topic category models to yield derived topics; and
  
  generating a multimedia description of the multimedia event based at least on the identified target speaker, the plurality of semantically coherent text blocks, and the derived topics, wherein the multimedia description comprises at least a timeline representation having a plurality of layers showing multiple categorizations of the multimedia data stream for each instance of time.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, further comprising:
    - separating from the multimedia data stream audio components and visual components in addition to the text components.
  - 3. The method of claim 2, further comprising:
    - segmenting the audio components, the visual components, and the text components of the multimedia data stream based on semantic differences, wherein frame level features are extracted from the audio component in a plurality of subbands.
  - 4. The method of claim 3, further comprising:
    - identifying at least one target speaker using the audio components and the visual components.
  - 5. The method of claim 4, further comprising:
    - generating a summary of multimedia content based on the audio components, the visual components, the text components, the semantically coherent text blocks, and the identified target speaker.
  - 6. The method of claim 5, wherein generating the multimedia description of the multimedia event is further based on the identified target speaker and the summary.
  - 7. The method of claim 1, wherein generating the plurality of semantically coherent text blocks from the separated text components is based at least in part on identified semantic boundaries of text for at least one identified target speaker.

8. A computing device that automatically indexes and retrieves a multimedia event, the computing comprising:
- an automated multimedia content indexing and retrieval system;
  
  a first module, using the automated multimedia content indexing and retrieval system, configured to separate text components from a multimedia data stream associated with a multimedia event;
  
  a second module configured to generate a plurality of semantically coherent text blocks from the separated text components;
  
  a third module configured to identify a target speaker based on audio features in the multimedia data stream;
  
  a fourth module configured to derive a topic for each text block of the plurality of semantically coherent text blocks based on a set of topic category models, wherein at least one semantically coherent text block is generated by merging disconnected text blocks; and
  
  a fifth module configured to generate a multimedia description of the multimedia event based at least on the semantically coherent text blocks and the derived topics, wherein the multimedia description comprises at least a timeline representation having a plurality of layers showing multiple categorizations of the multimedia data stream for each instance of time.
- View Dependent Claims (9, 10, 11, 12, 13)
- - 9. The computing device of claim 8, further comprising:
    - a sixth module configured to separate from the multimedia data stream audio components and visual components in addition to the text components.
  - 10. The computing device of claim 9, further comprising:
    - a seventh module configured to segment the audio components, the visual components, and the text components of the multimedia data stream based on semantic differences, wherein frame level features are extracted from the audio component in a plurality of subbands.
  - 11. The computing device of claim 10, further comprising:
    - an eighth module configured to identify at least one target speaker using the audio components and the visual components.
  - 12. The computing device of claim 11, further comprising:
    - a ninth module configured to generate a summary of multimedia content based on the audio components, the visual components, the text components, the semantically coherent text blocks and the identified target speaker.
  - 13. The computing device of claim 8, wherein the second module configured to generate the plurality of semantically coherent text blocks from the separated text components is based at least in part on identified semantic boundaries of text for at least one identified target speaker.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
AT&T Intellectual Property II LP (AT&T, Inc.)
Inventors
Gibbon, David Crawford, Huang, Qian, Liu, Zhu, Rosenberg, Aaron Edward, Shahraray, Behzad
Primary Examiner(s)
Lerner, Martin

Application Number

US11/623,955
Time in Patent Office

1,875 Days
Field of Search

704/231, 704/235, 704/246, 704/251, 704/255, 704/270, 704/278, 725/40, 725/45, 715/201, 715/716
US Class Current

704/270
CPC Class Codes

G06F 16/739   in form of a video summary,...

G06F 16/7834   using audio features

G06F 16/7844   using original textual cont...

G10L 17/00   Speaker identification or v...

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99943   Generating database or data...

System and method for automated multimedia content indexing and retrieval

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

56 Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for automated multimedia content indexing and retrieval

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

56 Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links