Audio visual speech recognition

US 4,757,541 A
Filed: 12/01/1986
Issued: 07/12/1988
Est. Priority Date: 11/05/1985
Status: Expired due to Fees

First Claim

Patent Images

1. An apparatus for producing an output indicating at least some of a sequence of spoken phonemes from a human speaker comprising:

means for detecting-sounds and converting said sounds into an electrical signal;

means for analyzing said signal to detect said phonemes to produce an electrical acoustic output signal indicating for each of at least some of said detected phonemes one group of a plurality of phoneme groups including the detected phoneme, each of said phoneme groups including at least one phoneme;

means for optically scanning the face of said speaker and producing an electrical lipshape signal representing the visual manifestation for at least some of said spoken phonemes indicating one of a plurality of lipshapes, each lipshape being associated with at least one phoneme; and

means for receiving and correlating said lipshape signal and said acoustic output signal to produce said output.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for indicating at least some of a sequence of spoken phonemes in which detected sounds are analyzed to determine a group of phonemes to which a phoneme belongs, the lipshape is optically detected and the respective signals correlated by a computer to indicate the detected phoneme.

47 Citations

View as Search Results

5 Claims

1. An apparatus for producing an output indicating at least some of a sequence of spoken phonemes from a human speaker comprising:
- means for detecting-sounds and converting said sounds into an electrical signal;
  
  means for analyzing said signal to detect said phonemes to produce an electrical acoustic output signal indicating for each of at least some of said detected phonemes one group of a plurality of phoneme groups including the detected phoneme, each of said phoneme groups including at least one phoneme;
  
  means for optically scanning the face of said speaker and producing an electrical lipshape signal representing the visual manifestation for at least some of said spoken phonemes indicating one of a plurality of lipshapes, each lipshape being associated with at least one phoneme; and
  
  means for receiving and correlating said lipshape signal and said acoustic output signal to produce said output.
- View Dependent Claims (2, 3, 4)
- - 2. An apparatus as in claim 1 wherein said receiving and correlating means includes a multiplexer for receiving signals from said scanning and analyzing means, an analog to digital converter connected to the output of said multiplexer and a digital computer connected to the output of said converter.
  - 3. An apparatus as in claim 1 or 2 wherein said scanning means includes an optical scanner, means for normalizing the distance between said scanner and the speaker'"'"'s lips, means for extracting the mouth area, means for extracting the lip contour and means for detecting teeth and tongue positions.
  - 4. An apparatus as in claim 1 or 2 wherein said analyzing means includes a low pass filter, means for analyzing the output of said low pass filter, a high pass filter and means for analyzing the output of said high pass filter.

5. A method of producing an output indicating at least some of a sequence of spoken phonemes from a human speaker comprising the steps of:
- detecting sounds and converting said sounds into an electrical signal;
  
  analyzing said signal to detect said phonemes to produce an electrical acoustic output signal indicating for each of at least some of said detected phonemes one group of a plurality of phoneme groups including the detected phoneme, each of said phoneme groups including at least one phoneme;
  
  optically scanning the face of said speaker and producing an electrical lipshape signal representing the visual manifestation for at least some of said spoken phonemes indicating one of a plurality of lipshapes, each lipshape being associated with at least one phoneme; and
  
  correlating said lip-shape signal and said acoustic output signal to produce said output.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Robert L. Beadles
Original Assignee
Research Triangle Institute
Inventors
Beadles, Robert L.
Primary Examiner(s)
KEMENY, EMANUEL

Application Number

US06/936,954
Time in Patent Office

589 Days
Field of Search

381/41-43, 382/2, 382/25, 382/30, 382/36, 364/513.5
US Class Current

704/254
CPC Class Codes

G10L 15/24 Speech recognition using no...

Audio visual speech recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

47 Citations

5 Claims

Specification

Solutions

Use Cases

Quick Links

Audio visual speech recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

47 Citations

5 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links