Apparatus and method for speech segment detection and system for speech recognition
First Claim
1. An apparatus for speech segment detection including a sound receiver and an image receiver, the apparatus comprising:
- a lip motion signal detector for detecting a motion region from image frames output from the image receiver, applying lip motion image feature information to the detected motion region, and detecting a lip motion signal for determining whether or not a sound frame is a speech frame; and
a speech segment detector for detecting a speech segment using sound frames output from the sound receiver and the lip motion signal detected from the lip motion signal detector, the speech segment detector determining whether the sound frame is a potential speech frame or is stationary noise, and when it is determined that the sound frame is a potential speech frame, determining whether the potential speech frame is a speech frame or dynamic noise according to the lip motion signal.
2 Assignments
0 Petitions
Accused Products
Abstract
Provided are an apparatus and method for speech segment detection, and a system for speech recognition. The apparatus is equipped with a sound receiver and an image receiver and includes: a lip motion signal detector for detecting a motion region from image frames output from the image receiver, applying lip motion image feature information to the detected motion region, and detecting a lip motion signal; and a speech segment detector for detecting a speech segment using sound frames output from the sound receiver and the lip motion signal detected from the lip motion signal detector. Since lip motion image information is checked in a speech segment detection process, it is possible to prevent dynamic noise from being misrecognized as speech.
-
Citations
12 Claims
-
1. An apparatus for speech segment detection including a sound receiver and an image receiver, the apparatus comprising:
-
a lip motion signal detector for detecting a motion region from image frames output from the image receiver, applying lip motion image feature information to the detected motion region, and detecting a lip motion signal for determining whether or not a sound frame is a speech frame; and a speech segment detector for detecting a speech segment using sound frames output from the sound receiver and the lip motion signal detected from the lip motion signal detector, the speech segment detector determining whether the sound frame is a potential speech frame or is stationary noise, and when it is determined that the sound frame is a potential speech frame, determining whether the potential speech frame is a speech frame or dynamic noise according to the lip motion signal. - View Dependent Claims (2, 3)
-
-
4. A method for speech segment detection in a speech recognition system including a sound receiver and an image receiver, the method comprising the steps of:
-
removing stationary noise from a sound frame output from the sound receiver, and determining whether or not the sound frame from which the noise is removed is a potential speech frame; when it is determined that the sound frame is a potential speech frame, determining whether or not a lip motion signal is detected from image frames at a point of time when the potential speech frame is detected; when it is determined that the lip motion signal is detected from the image frames, determining that the potential speech frame is a speech frame, storing the speech frame, and determining whether or not the number of speech frames is at least a predetermined number; and when it is determined that the number of speech frames is at least the predetermined number, detecting the speech frames as a speech segment. - View Dependent Claims (5, 6, 7, 8, 9, 10)
-
-
11. A system for speech recognition, comprising:
-
a sound receiver for converting a sound signal input by a user into a digital signal and framing the digital signal; an image receiver for framing an image signal obtained by an image recorder; a lip motion signal detector for detecting a motion region from the image frames output from the image receiver, applying lip motion image feature information to the detected motion region, and detecting a lip motion signal; a speech segment detector for determining whether the sound frame output from the sound receiver is a potential speech frame or is stationary noise, and when it is determined that the sound frame is a potential speech frame, detecting a speech segment according to the lip motion signal detected by the lip motion signal detector; a feature vector extractor for extracting a feature vector from the speech segment detected by the speech segment detector; and a speech recognizer for performing speech recognition using the feature vector extracted by the feature vector extractor to convert the sound signal to characters. - View Dependent Claims (12)
-
Specification