Method of recognizing speech using a lip image

US 4,769,845 A
Filed: 04/06/1987
Issued: 09/06/1988
Est. Priority Date: 04/10/1986
Status: Expired due to Fees

First Claim

Patent Images

1. A method of recognizing speech by inputting a lip image, comprising the steps of:

using image pickup means for picking up the lip image from the lips during speech;

receiving and processing, using data processing means connected to said image pickup means, lip data from the image pickup means in the form of an image signal indicative of the lip image;

collating said lip data with language data previously stored in a first memory provided in said data processing means;

selecting from said language data the language corresponding to said lip data; and

thereby recognizing the speech.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition method is implemented with an image pickup apparatus such as a TV camera which picks up a lip image during speech and with a small computer which has a small memory capacity and is connected to the TV camera. The computer receives and processes as lip data an image signal from the TV camera which represents the lip image. The lip data is collated with language data stored in the memory of the computer so as to select the language corresponding to the lip data, thereby recognizing the speech. A microphone may also be provided to output to the system a voice waveform signal serving as voice data. This voice data is collated with the language data stored in the memory of the computer to select the language corresponding to the voice data, thereby recognizing the speech on the basis of the language selected using the lip data and using the voice data. Image pattern data and voice pattern data may be extracted and processed for every word, or for every unit sound. With the inventive method, the speech recognition ratio and processing speed are improved, particularly with respect to use of a computer with a small memory capacity.

64 Citations

View as Search Results

11 Claims

1. A method of recognizing speech by inputting a lip image, comprising the steps of:
- using image pickup means for picking up the lip image from the lips during speech;
  
  receiving and processing, using data processing means connected to said image pickup means, lip data from the image pickup means in the form of an image signal indicative of the lip image;
  
  collating said lip data with language data previously stored in a first memory provided in said data processing means;
  
  selecting from said language data the language corresponding to said lip data; and
  
  thereby recognizing the speech.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. A speech recognizing method according to claim 1, including the steps of inputting said lip image to said data processing means as the lip data for every word, and collating the lip data for every word with said stored language data to select the corresponding language, thereby recognizing the input speech.
  - 3. A speech recognizing method according to claim 1, including the steps of causing said data processing means to receive the lip image as the lip data for every unit sound;
    - collating the lip data for every unit sound with unit sound data previously stored into a second memory provided in said data processing means;
      
      identifying each unit sound corresponding to said lip data;
      
      collating a combination of the identified unit sounds with said stored language data;
      
      selecting the language corresponding to said combination; and
      
      thereby recognizing the input speech.
  - 4. A speech recognizing method according to claim 1, wherein said image pickup means is a television camera which picks up the lip image and outputs the image data to said data processing means.
  - 5. A speech recognizing method according to claim 1, further comprising the step of providing microphone means for picking up the sound of the speech and outputting a waveform signal indicative of the waveform of said speech to said data processing means.
  - 6. A speech recognizing method according to claim 5, further comprising the steps of:
    - receiving said waveform signal in said data processing means as voice data;
      
      collating said voice data with language data previously stored in a second memory provided in the data processing means;
      
      selecting from said language data the language corresponding to said voice data; and
      
      thereby recognizing the speech on the basis of the language selected using said lip data and the language selected using said voice data.
  - 7. A speech recognizing method according to claim 5, including the steps of causing said data processing means to receive said lip image as the lip data for every word;
    - collating said lip data for every word with said stored language data;
      
      selecting from said language data the language corresponding to said lip data;
      
      receiving said voice waveform as voice data for every unit sound;
      
      collating said voice data for every unit sound with said stored unit sound data;
      
      identifying each unit sound corresponding to said voice data;
      
      collating a combination of said identified unit sounds with the stored language data;
      
      selecting from said language data the language corresponding to said combination of unit sounds; and
      
      thereby recognizing the speech on the basis of the language selected using said lip data and the language selected using said voice data.
  - 8. A voice recognizing method according to claim 5, including the steps of causing said data processing means to receive said lip image as the lip data for every unit sound;
    - collating said lip data for every unit sound with unit sound data previously stored in a second memory provided in said data processing means;
      
      identifying each unit sound corresponding to said lip data;
      
      collating a first combination of said identified unit sounds corresponding to said lip data with said language data;
      
      selecting from said language data the language corresponding to said first combination;
      
      receiving said voice waveform as voice data for every unit sound;
      
      collating said voice data for every unit sound with said stored unit sound data;
      
      identifying each unit sound corresponding to said voice data;
      
      collating a second combination of said identified unit sounds corresponding to said voice data with said language data;
      
      selecting from said language data the language corresponding to said second combination; and
      
      thereby recognizing the speech on the basis of the language selected using said lip data and the language selected using said voice data.

9. A method of speech recognition, comprising the steps of:
- producing a time-varying image of the lips of a person making a verbal statement, said verbal statement being made up of at least one unit of language;
  
  converting said lip image into a signal;
  
  extracting from said signal a plurality of characteristic parameters of said lip image which vary with time;
  
  accessing reference language data previously stored in a memory, said reference language data including a predetermined plurality of units of language and including for each such unit of language a set of characteristic parameters corresponding thereto;
  
  comparing said extracted characteristic parameters representative of said lip image to said stored sets of characteristic parameters so as to successively identify stored sets of characteristic parameters substantially equivalent to successive portions of said extracted characteristic parameters;
  
  retrieving the stored unit of language associated with each said identified set of characteristic parameters; and
  
  arranging said retrieved units of language in the chronological order of occurrence of said successive portions of said extracted characteristic parameters to thereby produce an accurate representation of the entire verbal statement.
- View Dependent Claims (10, 11)
- - 10. A method according to claim 9, wherein each said language unit is one word.
  - 11. A method according to claim 9, wherein said extracted characteristic parameters include the area in said lip image of the opening between the lips and the ratio of the vertical dimension to the horizontal dimension of said opening between said lips in said lip image.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Kabushiki Kaisha Carrylab
Original Assignee
Kabushiki Kaisha Carrylab
Inventors
Nakamura, Hiroyuki
Primary Examiner(s)
Roskoski, Bernard

Application Number

US07/035,123
Time in Patent Office

519 Days
Field of Search

381/41-53, 364/513.5
US Class Current

704/231
CPC Class Codes

G06V 40/168 Feature extraction; Face re...

G10L 15/25 using position of the lips,...

Method of recognizing speech using a lip image

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

64 Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

Method of recognizing speech using a lip image

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

64 Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links