Method and system for person identification using video-speech matching
First Claim
Patent Images
1. An audio-visual system for processing video data comprising:
- an object detection module capable of providing a plurality of object features from the video data;
an audio processor module capable of providing a plurality of audio features from the video data;
a processor coupled to the object detection and the audio segmentation modules, wherein the processor is arranged determine a correlation between the plurality of object features and the plurality of audio features.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system are disclosed for determining who is the speaking person in video data. This may be used to add in person identification in video content analysis and retrieval applications. A correlation is used to improve the person recognition rate relying on both face recognition and speaker identification. Latent Semantic Association (LSA) process may also be used to improve the association of a speaker'"'"'s face with his voice. Other sources of data (e.g., text) may be integrated for a broader domain of video content understanding applications.
82 Citations
20 Claims
-
1. An audio-visual system for processing video data comprising:
-
an object detection module capable of providing a plurality of object features from the video data;
an audio processor module capable of providing a plurality of audio features from the video data;
a processor coupled to the object detection and the audio segmentation modules, wherein the processor is arranged determine a correlation between the plurality of object features and the plurality of audio features. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for identifying a speaking person within video data, the method comprising the steps of:
-
receiving video data including image and audio information;
determining a plurality of face image features from one or more faces in the video data;
determining a plurality of audio features related to audio information;
calculating a correlation between the plurality of face image features and the audio features; and
determining the speaking person based upon the correlation. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A memory medium including code for processing a video including images and audio, the code comprising:
-
code to obtain a plurality of object features from the video;
code to obtain a plurality of audio features from the video;
code to determine a correlation between the plurality of object features and the plurality of audio features; and
code to determine an association between one or more objects in the video and the audio. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification