Video control of speech recognition

US 6,243,683 B1
Filed: 12/29/1998
Issued: 06/05/2001
Est. Priority Date: 12/29/1998
Status: Expired due to Term

First Claim

Patent Images

1. A method of controlling the operation of a speech recognition unit, comprising automatically analyzing at least one video image to detect a gesture of a user that signifies a command, and supplying the command to the speech recognition unit to control operation of the speech recognition unit.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Method and apparatus for using video input to control speech recognition systems is disclosed. In one embodiment, gestures of a user of a speech recognition system are detected from a video input, and are used to turn a speech recognition unit on and off. In another embodiment, the position of a user is detected from a video input, and the position information supplied to a microphone array point of source filter to aid the filter in selecting the voice of a user that is moving about in the field of the camera supplying the video input.

323 Citations

21 Claims

1. A method of controlling the operation of a speech recognition unit, comprising automatically analyzing at least one video image to detect a gesture of a user that signifies a command, and supplying the command to the speech recognition unit to control operation of the speech recognition unit.
- View Dependent Claims (2, 3, 4, 5)
- - 2. A method according to claim 1 wherein the command comprises a start or stop command used to start or stop speech recognition.
  - 3. A method according to claim 1 wherein the gesture comprising a motion.
  - 4. A method according to claim 1 wherein the gesture comprises the user looking into the camera.
  - 5. A method according to claim 1 wherein a gesture includes one or more of the group including motions and positions of a user, and wherein both a motion and a position are used to signify a command.

6. A method comprising filtering with a filter an audio input signal and supplying it to a speech recognition unit, wherein the position of a user supplying speech to be recognized is automatically determined by a computer analysis of at least one video image obtained from one or more cameras having a field of view encompassing the user, and position information obtained from the analysis is used by the filter to at least in part isolate the user'"'"'s voice from other sounds in the user'"'"'s environment.
- View Dependent Claims (7, 8)
- - 7. A method according to claim 6 wherein the audio input signal is obtained from a microphone array.
  - 8. A method according to claim 6 wherein the position is determined using a face tracking algorithm.

9. A method, comprisingfiltering an audio input signal and supplying it to a speech recognition unit, wherein the position of a user supplying speech to be recognized is automatically determined by a computer analysis of a video image obtained at least in part from at least one camera having a field of view encompassing the user, and position information obtained from the analysis is used by a filter to at least in part isolate the user'"'"'s voice from other sounds in the user'"'"'s environment;
- and controlling the operation of the speech recognition unit, comprising analyzing one or more video images to detect a gesture of a user that signifies a command, and supplying that command to the speech recognition unit.
- View Dependent Claims (10, 11, 12, 13, 14, 15)
- - 10. A method according to claim 9 wherein the command comprises a start or stop command used to start or stop speech recognition.
  - 11. A method according to claim 9 wherein the gesture comprises a motion.
  - 12. A method according to claim 9 wherein the gesture comprises the user looking into the camera.
  - 13. A method according to claim 9 wherein a gesture includes one or more of the group including motions and positions of a user, and wherein both a motion and a position are used to signify a command.
  - 14. A method according to claim 9 further comprising obtaining the audio input signal from a microphone array.
  - 15. A method according to claim 9 wherein the position is determined using a face tracking algorithm.

16. Apparatus for controlling the operation of a speech recognition unit response to a gesture by a user, comprisinga unit receiving at least one video image of the user, automatically analyzing the video image to detect a gesture of the user that signifies a command, and outputting the command to the speech recognition unit to control operation of the speech unit.

17. Apparatus comprising a filtering unit thatreceives an audio input signal and position information about a position of a user supplying speech as the source of the audio input signal, wherein the position of the user is automatically determined by analysis of at least one video image of the user, wherein the filter unit further outputs a filtered audio signal based on the position information to a speech recognition unit.

18. Apparatus comprising:
- a speech recognition unit;
  
  a unit that receives at least one video image, automatically analyzes the video image to detect a gesture of a user that signifies a command, outputs the command to the speech recognition unit, and outputs position information about the position of the user in the image that signifies that a user has made the gesture that signifies the command; and
  
  a filtering unit that receives an audio input signal and the position information about the position of a user supplying speech as the source of the audio input signal, the filter unit further supplies a filtered audio signal to the speech recognition unit, wherein the filtered audio signal produced by the filtering unit depends on the position information.
- View Dependent Claims (19)
- - 19. Apparatus according to claim 18 further comprising

20. An article comprisinga computer program in a machine readable medium wherein the computer program will execute on a suitable platform to control the operation of a speech recognition unit and is operative to automatically analyze at least one video image to detect a gesture of a user that signifies a command, and supply the command to the speech recognition unit.

21. An article comprisinga computer program embodied in a machine readable medium wherein the computer program executes on a suitable platform to analyze at least one video image obtained from one or more cameras having a field of view encompassing a user, and automatically determines information specifying the location of the user in the field of view, and supplies the position information to a filter unit which at least in part isolate the user'"'"'s voice from other sounds in the user'"'"'s environment in response to the position information.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intel Corporation
Original Assignee
Intel Corporation
Inventors
Peters, Geoffrey W.
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
Nolan, Daniel A.

Application Number

US09/223,074
Time in Patent Office

889 Days
Field of Search

704/270, 704/231, 704/260, 704/275, 381/111-117, 379/69, 379/70, 379/79, 379/80, 379/902, 434/4
US Class Current

704/273
CPC Class Codes

G10L 15/24 Speech recognition using no...

Video control of speech recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

323 Citations

21 Claims

Specification

Use Cases

Quick Links

Others

Video control of speech recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

323 Citations

21 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others