Speech recognition
First Claim
1. A method of controlling the operation of a speech recognition device, comprising the steps of:
- recording at least one frame of a video image of speech articulators of a user while the user is speaking;
recording acoustic properties of speech that occurs concurrent with the recording of the at least one video frame;
identifying acoustic properties of speech that would be expected to be generated by a condition of the speech articulators recorded in the at least one frame of the video image; and
comparing the identified acoustic properties of speech with the recorded acoustic properties to determine whether the speech of the recorded properties emanated from the user.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus for automatically controlling the operation of a speech recognition system without requiring unusual or unnatural activity of the speaker by passively determining if received sound is speech of the user before activating the speech recognition system. A video camera and microphone are located in a hand-held device. The video camera records a video image of the speaker'"'"'s face, i.e., of speech articulators of the user such as the lips and/or mouth. The recorded characteristics of the articulators are analyzed to identify the sound that the articulators would be expected to make, as in “lip reading”. A microphone concurrently records the acoustic properties of received sound proximate the user. The recorded acoustic properties of the received sound are then compared to the characteristics of speech that would be expected to be generated by the recorded speech articulators to determine whether they match. If so, then the received sound is identified as having emanated from the user the speech recognition system is operated to perform speech recognition of the received sound.
52 Citations
17 Claims
-
1. A method of controlling the operation of a speech recognition device, comprising the steps of:
-
recording at least one frame of a video image of speech articulators of a user while the user is speaking;
recording acoustic properties of speech that occurs concurrent with the recording of the at least one video frame;
identifying acoustic properties of speech that would be expected to be generated by a condition of the speech articulators recorded in the at least one frame of the video image; and
comparing the identified acoustic properties of speech with the recorded acoustic properties to determine whether the speech of the recorded properties emanated from the user. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method of controlling the operation of a speech recognition device comprising the steps of:
-
recording a series of frames of video images of speech articulators of a user while speaking;
recording acoustic properties of speech that occurs concurrent with the recording of each of the series of video frames;
identifying each frame of the series of frames of video images with the acoustic properties of sounds which are obtained concurrent with the recording of the series of video frames;
examining the video frames for a face;
examining the video frames that have a face for a change of the speech articulators of the face;
identifying acoustic properties of speech that would be expected to be generated by a condition of the speech articulator recorded in the video frame that has a changed speech articulator;
identifying the recorded acoustic properties of speech that occurred at the time that the video frame of a face having a change of speech articulators was obtained; and
comparing the identified acoustic properties of speech that occurred at the time that the video frame of a face having a change of speech articulators with the identified acoustic properties that would be expected to be generated to determine whether the speech of the identified acoustic properties emanated from the user. - View Dependent Claims (9, 10, 11)
-
-
12. Apparatus for controlling the operation of a speech recognition device comprising;
-
video means for recording at least one video image of the speech articulators of a user and analyzing the video image to identify the acoustic properties of speech that would be expected to be generated by the condition of the speech articulators;
acoustic means for recording acoustic properties of speech by the user that occur concurrently with the recording of the at least one video image;
comparing means for comparing the acoustic properties of speech that would be expected to be generated by the condition of the speech articulators with the recorded acoustic properties of speech by the user, and control means to activate the speech recognition device when the comparing means identifies a match. - View Dependent Claims (13, 14, 15)
-
-
16. An article comprising:
a computer program in a machine readable medium wherein the computer program executes on a suitable platform to control the operation of a speech recognition unit and is operative to automatically analyze at least one video image to detect a change of the speech articulators of the face of a user and generate a characteristic of speech which can be made by the shape of the speech articulators. - View Dependent Claims (17)
Specification