Method of recognizing speech using a lip image
First Claim
1. A method of recognizing speech by inputting a lip image, comprising the steps of:
- using image pickup means for picking up the lip image from the lips during speech;
receiving and processing, using data processing means connected to said image pickup means, lip data from the image pickup means in the form of an image signal indicative of the lip image;
collating said lip data with language data previously stored in a first memory provided in said data processing means;
selecting from said language data the language corresponding to said lip data; and
thereby recognizing the speech.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech recognition method is implemented with an image pickup apparatus such as a TV camera which picks up a lip image during speech and with a small computer which has a small memory capacity and is connected to the TV camera. The computer receives and processes as lip data an image signal from the TV camera which represents the lip image. The lip data is collated with language data stored in the memory of the computer so as to select the language corresponding to the lip data, thereby recognizing the speech. A microphone may also be provided to output to the system a voice waveform signal serving as voice data. This voice data is collated with the language data stored in the memory of the computer to select the language corresponding to the voice data, thereby recognizing the speech on the basis of the language selected using the lip data and using the voice data. Image pattern data and voice pattern data may be extracted and processed for every word, or for every unit sound. With the inventive method, the speech recognition ratio and processing speed are improved, particularly with respect to use of a computer with a small memory capacity.
64 Citations
11 Claims
-
1. A method of recognizing speech by inputting a lip image, comprising the steps of:
-
using image pickup means for picking up the lip image from the lips during speech; receiving and processing, using data processing means connected to said image pickup means, lip data from the image pickup means in the form of an image signal indicative of the lip image; collating said lip data with language data previously stored in a first memory provided in said data processing means; selecting from said language data the language corresponding to said lip data; and thereby recognizing the speech. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method of speech recognition, comprising the steps of:
-
producing a time-varying image of the lips of a person making a verbal statement, said verbal statement being made up of at least one unit of language; converting said lip image into a signal; extracting from said signal a plurality of characteristic parameters of said lip image which vary with time; accessing reference language data previously stored in a memory, said reference language data including a predetermined plurality of units of language and including for each such unit of language a set of characteristic parameters corresponding thereto; comparing said extracted characteristic parameters representative of said lip image to said stored sets of characteristic parameters so as to successively identify stored sets of characteristic parameters substantially equivalent to successive portions of said extracted characteristic parameters; retrieving the stored unit of language associated with each said identified set of characteristic parameters; and arranging said retrieved units of language in the chronological order of occurrence of said successive portions of said extracted characteristic parameters to thereby produce an accurate representation of the entire verbal statement. - View Dependent Claims (10, 11)
-
Specification