Indexing multimedia communications
First Claim
1. A method for indexing a multimedia communication, comprising:
- receiving the multimedia communication, the multimedia communication including a plurality of multimedia data packets;
processing the plurality of multimedia data packets to identify distinguishing features and associating each of the plurality of multimedia data packets with one of a plurality of objects within the multimedia communication;
comparing the distinguishing features with representations of the plurality of objects stored within a database to create verified distinguishing features;
associating the verified distinguishing features with each one of a plurality of objects;
indexing the plurality of multimedia data packets based on the verified distinguishing features; and
rebroadcasting in near real-time, during the multimedia communication, the processed plurality of multimedia data packets, wherein when the object is a speaker participating in the multimedia communication, combination of the speaker'"'"'s audio speech patterns and video statistical sampling of a face of the speaker is a portion of the distinguishing features.
1 Assignment
0 Petitions
Accused Products
Abstract
A network based platform uses face recognition, speech recognition, background change detection and key scene events to index multimedia communications. Before the multimedia communication begins, active participants register their speech and face models with a server. The process consists of creating a speech sample, capturing a sample image of the participant and storing the data in a database. The server provides an indexing function for the multimedia communication. During the multimedia communication, metadata including time stamping is retained along with the multimedia content. The time stamping information is used for synchronizing the multimedia elements. The multimedia communication is then processed through the server to identify the multimedia communication participants based on speaker and face recognition models. This allows the server to create an index table that becomes an index of the multimedia communication. In addition, through scene change detection and background recognition, certain backgrounds and key scene information can be used for indexing. Therefore, through this indexing apparatus and method, a specific participant can be recognized as speaking and the content that the participant discussed can also be used for indexing.
-
Citations
18 Claims
-
1. A method for indexing a multimedia communication, comprising:
-
receiving the multimedia communication, the multimedia communication including a plurality of multimedia data packets;
processing the plurality of multimedia data packets to identify distinguishing features and associating each of the plurality of multimedia data packets with one of a plurality of objects within the multimedia communication;
comparing the distinguishing features with representations of the plurality of objects stored within a database to create verified distinguishing features;
associating the verified distinguishing features with each one of a plurality of objects;
indexing the plurality of multimedia data packets based on the verified distinguishing features; and
rebroadcasting in near real-time, during the multimedia communication, the processed plurality of multimedia data packets, wherein when the object is a speaker participating in the multimedia communication, combination of the speaker'"'"'s audio speech patterns and video statistical sampling of a face of the speaker is a portion of the distinguishing features. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
identifying background changes in the video data;
identifying key scene events based on the video data; and
attaching a second header identifier to each multimedia data packet containing the background change and the key scene event, the second header identifier identifying the multimedia data packet as containing a background change and a key scene event.
-
-
7. The method of claim 6, wherein the multimedia communication is a multicast multimedia communication, the rebroadcasting step including multicasting the processed plurality of multimedia data packets.
-
8. The method of claim 7, wherein a time stamp is provided to synchronize the audio and the video data.
-
9. The method of claim 8, further comprising storing the indexed plurality of multimedia data packets, wherein the indexed plurality of multimedia data packets can be searched to retrieve audio and video multimedia data packets corresponding to selected distinguishing features.
-
10. The method of claim 9, wherein the indexed plurality of multimedia data packets can be searched using key words.
-
11. The method of claim 10, wherein the multimedia communication is conducted using a local area network.
-
12. The method of claim 1, wherein the indexing and the processing steps are performed at a multicast network.
-
13. An apparatus for indexing a multimedia communication, comprising:
-
a server that receives multimedia communication in multimedia data packets including audio, visual and data communications and identifies distinguishing features in the multimedia communication based on at least one of audio and video recognition and a source of the multimedia communications;
a header function module connected to the server, the header function module entering metadata in a header segment corresponding to the multimedia data packets received by the server, the metadata being related to the distinguishing features;
an index server that rebroadcasts in near real-time the multimedia communication;
a storage device that stores the multimedia data packets, wherein the distinguishing features include audio, voice and video face patterns of participants in the multimedia communication. - View Dependent Claims (14, 15, 16, 17)
-
-
18. A method of identifying participants to a multimedia communication that is rebroadcast in near real-time, comprising:
-
comparing audio speech patterns for each participant to speech models;
comparing video face patterns for each participant to face models;
determining an identity of a particular participant when both the audio speech patterns and the video face patterns match the speech and the face models for the particular participant; and
creating an index of the participants based on identification of speech and face patterns of the participants, the index being used to segment the multimedia communication.
-
Specification