METHOD FOR DETECTING VOICE SECTION FROM TIME-SPACE BY USING AUDIO AND VIDEO INFORMATION AND APPARATUS THEREOF
First Claim
1. A method for detecting a time-space voice section using audio and video information, comprising:
- detecting a voice section from an audio signal input to a microphone array;
performing speaker verification in the detected voice section;
detecting a speaker'"'"'s face by using a video signal input to a camera and estimating a speaker'"'"'s face direction when the speaker verification succeeds; and
determining the detected voice section as a speaker'"'"'s voice section when the estimated face direction matches a previously stored reference direction.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention relates to a method for detecting a voice section in time-space by using audio and video information. According to an embodiment of the present invention, a method for detecting a voice section from time-space by using audio and video information comprises the steps of: detecting a voice section in an audio signal which is inputted into a microphone array; verifying a speaker from the detected voice section; sensing the face of the speaker by using a video signal which is inputted into a camera if the speaker is successfully verified, and then estimating the direction of the face of the speaker; and determining the detected voice section as the voice section of the speaker if the estimated face direction corresponds to a reference direction which is previously stored.
230 Citations
13 Claims
-
1. A method for detecting a time-space voice section using audio and video information, comprising:
-
detecting a voice section from an audio signal input to a microphone array; performing speaker verification in the detected voice section; detecting a speaker'"'"'s face by using a video signal input to a camera and estimating a speaker'"'"'s face direction when the speaker verification succeeds; and determining the detected voice section as a speaker'"'"'s voice section when the estimated face direction matches a previously stored reference direction. - View Dependent Claims (2, 3, 4, 5, 6, 9)
-
-
7. A method for detecting a time-space voice section using audio and video information, comprising:
-
estimating a position of a sound source by using an audio signal input to a microphone array; detecting a voice section in the audio signal when the estimated position of the sound source does not match a previously stored reference position by a threshold value or more after comparing them each other; performing speaker verification in the detected voice section; detecting a speaker'"'"'s face using a video signal input to a camera and estimating a speaker'"'"'s face direction when the speaker verification succeeds; and determining the detected voice section as a speaker'"'"'s voice section when the estimated face direction matches the previously stored reference direction. - View Dependent Claims (8, 13)
-
-
10. An apparatus for detecting a time-space voice section using audio and video information, comprising:
-
a voice section detection unit that detects a voice section in an audio signal input to a microphone array; a speaker verification unit that performs speaker verification in the detected voice section; and a face direction verification unit that detects a speaker'"'"'s face using a video signal input to a camera and estimates a speaker'"'"'s face direction when the speaker verification succeeds and determines the detected voice section as a speaker'"'"'s voice section when the estimated face direction matches a previously stored reference direction.
-
-
11. An apparatus for detecting a time-space voice section using audio and video information, comprising:
-
a sound source position tracking unit that estimates a position of a sound source by using an audio signal input to a microphone array; a voice section detection unit that detects a voice section in the audio signal when the estimated position of the sound source does not match the previously stored reference position by a threshold value or more after comparing them each other; a speaker verification unit that performs speaker verification in the detected voice section; and a face direction verification unit that detects a speaker'"'"'s face using a video signal input to a camera and estimates a speaker'"'"'s face direction when the speaker verification succeeds and determines the detected voice section as a speaker'"'"'s voice section when the estimated face direction matches a previously stored reference direction. - View Dependent Claims (12)
-
Specification