Apparatus and method for speech segment detection and system for speech recognition
First Claim
1. An apparatus for speech segment detection including a sound receiver and an image receiver, the apparatus comprising:
- a lip motion signal detector for detecting a motion region from image frames output from the image receiver, applying lip motion image feature information to the detected motion region, and detecting a lip motion signal; and
a speech segment detector for detecting a speech segment using sound frames output from the sound receiver and the lip motion signal detected from the lip motion signal detector.
2 Assignments
0 Petitions
Accused Products
Abstract
Provided are an apparatus and method for speech segment detection, and a system for speech recognition. The apparatus is equipped with a sound receiver and an image receiver and includes: a lip motion signal detector for detecting a motion region from image frames output from the image receiver, applying lip motion image feature information to the detected motion region, and detecting a lip motion signal; and a speech segment detector for detecting a speech segment using sound frames output from the sound receiver and the lip motion signal detected from the lip motion signal detector. Since lip motion image information is checked in a speech segment detection process, it is possible to prevent dynamic noise from being misrecognized as speech.
64 Citations
12 Claims
-
1. An apparatus for speech segment detection including a sound receiver and an image receiver, the apparatus comprising:
-
a lip motion signal detector for detecting a motion region from image frames output from the image receiver, applying lip motion image feature information to the detected motion region, and detecting a lip motion signal; and
a speech segment detector for detecting a speech segment using sound frames output from the sound receiver and the lip motion signal detected from the lip motion signal detector. - View Dependent Claims (2, 3)
-
-
4. A method for speech segment detection in a speech recognition system including a sound receiver and an image receiver, the method comprising the steps of:
-
removing stationary noise from a sound frame output from the sound receiver, and determining whether or not the sound frame from which the noise is removed is a potential speech frame;
when it is determined that the sound frame is a potential speech frame, determining whether or not a lip motion signal is detected from image frames at a point of time when the potential speech frame is detected;
when it is determined that the lip motion signal is detected from the image frames, determining that the potential speech frame is a speech frame, storing the speech frame, and determining whether or not the number of speech frames is at least a predetermined number; and
when it is determined that the number of speech frames is at least the predetermined number, detecting the speech frames as a speech segment. - View Dependent Claims (5, 6, 7, 8, 9, 10)
-
-
11. A system for speech recognition, comprising:
-
a sound receiver for converting a sound signal input by a user into a digital signal and framing the digital signal;
an image receiver for framing an image signal obtained by an image recorder;
a lip motion signal detector for detecting a motion region from the image frames output from the image receiver, applying lip motion image feature information to the detected motion region, and detecting a lip motion signal;
a speech segment detector for detecting a speech segment using the sound frames output from the sound receiver and the lip motion signal detected by the lip motion signal detector;
a feature vector extractor for extracting a feature vector from the speech segment detected by the speech segment detector; and
a speech recognizer for performing speech recognition using the feature vector extracted by the feature vector extractor. - View Dependent Claims (12)
-
Specification