Speech recognition

US 20030171932A1
Filed: 03/07/2002
Published: 09/11/2003
Est. Priority Date: 03/07/2002
Status: Abandoned Application

First Claim

Patent Images

1. A method of controlling the operation of a speech recognition device, comprising the steps of:

recording at least one frame of a video image of speech articulators of a user while the user is speaking;

recording acoustic properties of speech that occurs concurrent with the recording of the at least one video frame;

identifying acoustic properties of speech that would be expected to be generated by a condition of the speech articulators recorded in the at least one frame of the video image; and

comparing the identified acoustic properties of speech with the recorded acoustic properties to determine whether the speech of the recorded properties emanated from the user.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for automatically controlling the operation of a speech recognition system without requiring unusual or unnatural activity of the speaker by passively determining if received sound is speech of the user before activating the speech recognition system. A video camera and microphone are located in a hand-held device. The video camera records a video image of the speaker'"'"'s face, i.e., of speech articulators of the user such as the lips and/or mouth. The recorded characteristics of the articulators are analyzed to identify the sound that the articulators would be expected to make, as in “lip reading”. A microphone concurrently records the acoustic properties of received sound proximate the user. The recorded acoustic properties of the received sound are then compared to the characteristics of speech that would be expected to be generated by the recorded speech articulators to determine whether they match. If so, then the received sound is identified as having emanated from the user the speech recognition system is operated to perform speech recognition of the received sound.

52 Citations

View as Search Results

17 Claims

1. A method of controlling the operation of a speech recognition device, comprising the steps of:
- recording at least one frame of a video image of speech articulators of a user while the user is speaking;
  
  recording acoustic properties of speech that occurs concurrent with the recording of the at least one video frame;
  
  identifying acoustic properties of speech that would be expected to be generated by a condition of the speech articulators recorded in the at least one frame of the video image; and
  
  comparing the identified acoustic properties of speech with the recorded acoustic properties to determine whether the speech of the recorded properties emanated from the user.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1 further comprising the step of:
    - activating the speech recognition device when there is a match between the acoustic properties of speech which would be expected to be generated by the condition of the speech articulators recorded in the at least one frame of video image with the acoustic properties of speech recorded concurrent with the recording of the at least one video frame.
  - 3. The method of claim 2 further comprising the step of:
    - maintaining the speech recognition device active for a preset time interval after being activated.
  - 4. The method of claim 3 further comprising the step of:
    - maintaining the speech recognition device activate beyond the end of the preset time interval upon obtaining a match between the acoustic properties of speech which would be expected to be generated by the condition of the speech articulators recorded in a subsequently recorded frame of a video image with the acoustic properties of speech recorded concurrent with the recording of the subsequently recorded video frame before the fixed period of time expires.
  - 5. The method of claim 1 wherein a camera is used to record the video image of the speech articulators of the user.
  - 6. The method of claim 1 wherein a microphone is used to record the acoustic properties of speech of the user.
  - 7. The method of claim 1 wherein a handheld device contains a microphone for recording the acoustic properties of speech of the user and a camera for recording the video image of speech articulators of the user.

8. A method of controlling the operation of a speech recognition device comprising the steps of:
- recording a series of frames of video images of speech articulators of a user while speaking;
  
  recording acoustic properties of speech that occurs concurrent with the recording of each of the series of video frames;
  
  identifying each frame of the series of frames of video images with the acoustic properties of sounds which are obtained concurrent with the recording of the series of video frames;
  
  examining the video frames for a face;
  
  examining the video frames that have a face for a change of the speech articulators of the face;
  
  identifying acoustic properties of speech that would be expected to be generated by a condition of the speech articulator recorded in the video frame that has a changed speech articulator;
  
  identifying the recorded acoustic properties of speech that occurred at the time that the video frame of a face having a change of speech articulators was obtained; and
  
  comparing the identified acoustic properties of speech that occurred at the time that the video frame of a face having a change of speech articulators with the identified acoustic properties that would be expected to be generated to determine whether the speech of the identified acoustic properties emanated from the user.
- View Dependent Claims (9, 10, 11)
- - 9. The method of claim 8 further comprising the step of:
    - activating the speech recognition device when there is a match between the identified acoustic properties of speech that occurred at the time that the video frame of a face having a change of speech articulators with the identified acoustic properties that would be expected to be generated concurrently wit the video frame.
  - 10. The method of claim 9 further comprising the step of:
    - maintaining the speech recognition device activated for a preset time interval after activating the speech recognition device.
  - 11. The method of claim 10 further comprising the step of:
    - deactivating the speech recognition device at the end of the preset time interval in the absence of the occurrence of a subsequent match between the identified acoustic properties of speech that occurred at the time that the video frame of a face having a change of speech articulators with the identified acoustic properties that would be expected to be generated concurrently with the video frame.

12. Apparatus for controlling the operation of a speech recognition device comprising;
- video means for recording at least one video image of the speech articulators of a user and analyzing the video image to identify the acoustic properties of speech that would be expected to be generated by the condition of the speech articulators;
  
  acoustic means for recording acoustic properties of speech by the user that occur concurrently with the recording of the at least one video image;
  
  comparing means for comparing the acoustic properties of speech that would be expected to be generated by the condition of the speech articulators with the recorded acoustic properties of speech by the user, and control means to activate the speech recognition device when the comparing means identifies a match.
- View Dependent Claims (13, 14, 15)
- - 13. Apparatus according to claim 12 further comprising:
    - a video signal processing means for analyzing the a least one video image to identify the acoustic properties of speech that would be generated by the condition of the speech articulators.
  - 14. The apparatus of claim 12 wherein the video means is a video camera and the acoustic means is a microphone.
  - 15. The apparatus of claim 14 wherein the video camera and microphone are in a handheld device.

16. An article comprising:
- a computer program in a machine readable medium wherein the computer program executes on a suitable platform to control the operation of a speech recognition unit and is operative to automatically analyze at least one video image to detect a change of the speech articulators of the face of a user and generate a characteristic of speech which can be made by the shape of the speech articulators.
- View Dependent Claims (17)
- - 17. The article of claim 16 wherein the computer program automatically compares the generated speech with actual speech made at the time that the video image was obtained to determine if the actual speech is the speech of the user at the time that the video image was made.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Avaya Technology Corporation Miami Lakes FLA US
Original Assignee
Avaya Technology Corporation Miami Lakes FLA US
Inventors
Juang, Biing-Hwang, Zhong, Jialin

Application Number

US10/092,876
Publication Number

US 20030171932A1
Time in Patent Office

Days
Field of Search
US Class Current

704/276
CPC Class Codes

G10L 15/24 Speech recognition using no...

Speech recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

52 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

52 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links