METHOD FOR DETECTING VOICE SECTION FROM TIME-SPACE BY USING AUDIO AND VIDEO INFORMATION AND APPARATUS THEREOF

US 20120078624A1
Filed: 02/10/2010
Published: 03/29/2012
Est. Priority Date: 02/27/2009
Status: Active Grant

First Claim

Patent Images

1. A method for detecting a time-space voice section using audio and video information, comprising:

detecting a voice section from an audio signal input to a microphone array;

performing speaker verification in the detected voice section;

detecting a speaker'"'"'s face by using a video signal input to a camera and estimating a speaker'"'"'s face direction when the speaker verification succeeds; and

determining the detected voice section as a speaker'"'"'s voice section when the estimated face direction matches a previously stored reference direction.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention relates to a method for detecting a voice section in time-space by using audio and video information. According to an embodiment of the present invention, a method for detecting a voice section from time-space by using audio and video information comprises the steps of: detecting a voice section in an audio signal which is inputted into a microphone array; verifying a speaker from the detected voice section; sensing the face of the speaker by using a video signal which is inputted into a camera if the speaker is successfully verified, and then estimating the direction of the face of the speaker; and determining the detected voice section as the voice section of the speaker if the estimated face direction corresponds to a reference direction which is previously stored.

230 Citations

13 Claims

1. A method for detecting a time-space voice section using audio and video information, comprising:
- detecting a voice section from an audio signal input to a microphone array;
  
  performing speaker verification in the detected voice section;
  
  detecting a speaker'"'"'s face by using a video signal input to a camera and estimating a speaker'"'"'s face direction when the speaker verification succeeds; and
  
  determining the detected voice section as a speaker'"'"'s voice section when the estimated face direction matches a previously stored reference direction.
- View Dependent Claims (2, 3, 4, 5, 6, 9)
- - 2. The method for detecting the time-space voice section using the audio and video information according to claim 1,wherein the detecting the voice section includes:
    - estimating a position of a sound source by using the audio signal input to the microphone array; and
      
      distinguishing noise by comparing the estimated position of the sound source and a previously stored reference position with each other.
  - 3. The method for detecting the time-space voice section using the audio and video information according to claim 2,wherein the performing the speaker verification includes:
    - changing a value of the reference position as the estimated position of the sound source when the speaker verification succeeds.
  - 4. The method for detecting the time-space voice section using the audio and video information according to claim 2,wherein the estimating the position of the sound source is using a signal with a certain SNR or more in the audio signal input to the microphone array.
  - 5. The method for detecting the time-space voice section using the audio and video information according to claim 2,wherein the detecting the voice section further includes:
    - removing the distinguished noise; and
      
      detecting a voice section on the basis of a single microphone in the signal of which the noise is removed.
  - 6. The method for detecting the time-space voice section using the audio and video information according to claim 5,wherein the removing the distinguished noise includes:
    - removing a signal of a sound source estimated as a position different from the previously stored position.
  - 9. A non-transitory recording medium readable by a computer system and recording a program causing the method for detecting the time-space voice section using the audio and video information according to claim 1 to be executed by the computer system.

7. A method for detecting a time-space voice section using audio and video information, comprising:
- estimating a position of a sound source by using an audio signal input to a microphone array;
  
  detecting a voice section in the audio signal when the estimated position of the sound source does not match a previously stored reference position by a threshold value or more after comparing them each other;
  
  performing speaker verification in the detected voice section;
  
  detecting a speaker'"'"'s face using a video signal input to a camera and estimating a speaker'"'"'s face direction when the speaker verification succeeds; and
  
  determining the detected voice section as a speaker'"'"'s voice section when the estimated face direction matches the previously stored reference direction.
- View Dependent Claims (8, 13)
- - 8. The method for detecting the time-space voice section using the audio and video information according to claim 7,wherein the performing the speaker verification includes:
    - changing a value of the reference position as the estimated position of the sound source when the speaker verification succeeds.
  - 13. A non-transitory recording medium readable by a computer system and recording a program causing the method for detecting the time-space voice section using the audio and video information according to claim 7 to be executed by the computer system.

10. An apparatus for detecting a time-space voice section using audio and video information, comprising:
- a voice section detection unit that detects a voice section in an audio signal input to a microphone array;
  
  a speaker verification unit that performs speaker verification in the detected voice section; and
  
  a face direction verification unit that detects a speaker'"'"'s face using a video signal input to a camera and estimates a speaker'"'"'s face direction when the speaker verification succeeds and determines the detected voice section as a speaker'"'"'s voice section when the estimated face direction matches a previously stored reference direction.

11. An apparatus for detecting a time-space voice section using audio and video information, comprising:
- a sound source position tracking unit that estimates a position of a sound source by using an audio signal input to a microphone array;
  
  a voice section detection unit that detects a voice section in the audio signal when the estimated position of the sound source does not match the previously stored reference position by a threshold value or more after comparing them each other;
  
  a speaker verification unit that performs speaker verification in the detected voice section; and
  
  a face direction verification unit that detects a speaker'"'"'s face using a video signal input to a camera and estimates a speaker'"'"'s face direction when the speaker verification succeeds and determines the detected voice section as a speaker'"'"'s voice section when the estimated face direction matches a previously stored reference direction.
- View Dependent Claims (12)
- - 12. The apparatus for detecting the time-space voice section using the audio and video information according to claim 11,wherein the speaker verification unit changes a value of the reference position as the position of the estimated sound source when the speaker verification succeeds.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Korea University Industrial & Academic Collaboration Foundation
Original Assignee
Korea University Industrial & Academic Collaboration Foundation
Inventors
Yook, Dongsuk, Lee, Hyeowoo

Granted Patent

US 9,431,029 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/233
CPC Class Codes

G10L 17/00   Speaker identification or v...

G10L 2021/02166   Microphone arrays; Beamforming

G10L 25/78   Detection of presence or ab...

METHOD FOR DETECTING VOICE SECTION FROM TIME-SPACE BY USING AUDIO AND VIDEO INFORMATION AND APPARATUS THEREOF

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

230 Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

METHOD FOR DETECTING VOICE SECTION FROM TIME-SPACE BY USING AUDIO AND VIDEO INFORMATION AND APPARATUS THEREOF

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

230 Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links