Speech Signal Enhancement Using Visual Information
7 Assignments
0 Petitions
Accused Products
Abstract
Visual information is used to alter or set an operating parameter of an audio signal processor, other than a beamformer. A digital camera captures visual information about a scene that includes a human speaker and/or a listener. The visual information is analyzed to ascertain information about acoustics of a room. A distance between the speaker and a microphone may be estimated, and this distance estimate may be used to adjust an overall gain of the system. Distances among, and locations of, the speaker, the listener, the microphone, a loudspeaker and/or a sound-reflecting surface may be estimated. These estimates may be used to estimate reverberations within the room and adjust aggressiveness of an anti-reverberation filter, based on an estimated ratio of direct to indirect (reverberated) sound energy expected to reach the microphone. In addition, orientation of the speaker or the listener, relative to the microphone or the loudspeaker, can also be estimated, and this estimate may be used to adjust frequency-dependent filter weights to compensate for uneven frequency propagation of acoustic signals from a mouth, or to a human ear, about a human head.
109 Citations
53 Claims
-
1-33. -33. (canceled)
-
34. A method, comprising:
-
providing a microphone to detect speech uttered by a speaker and generate audio signals from the speech received by the microphone; coupling an audio signal processor to the microphone to receive the audio signals and process the received audio signals; providing a camera that can be at least partially orientated toward the microphone to generate a scene image; coupling an image analyzer to the camera to automatically analyze the scene image for estimating a distance between the speaker and the microphone; and coupling a tuner to the image analyzer and to the audio signal processor to automatically alter an operating parameter of the audio signal processor, based at least in part on the estimated distance between the speaker and the microphone. - View Dependent Claims (35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51)
-
-
52. An audio system for use by a plurality of speakers, the system comprising:
-
a microphone configured to detect speech uttered by at least one of the plurality of speakers and generate corresponding audio signals; an audio signal processor coupled to the microphone to receive the audio signals and configured to process the received audio signals; a camera orientable at least partially toward the microphone at and configured to generate a scene image; an image analyzer coupled to the camera and configured to automatically analyze the scene image, so as to detect a gesture by at least one of the speakers; and a tuner coupled to the image analyzer and to the audio signal processor and configured to automatically alter an operating parameter of the audio signal processor, and process audio signals corresponding to the at least one of the speakers who gestured.
-
-
53. A tangible non-transitory computer-readable storage medium with an executable program stored thereon for automatically processing speech uttered by a speaker into a microphone, wherein the program enables a machine to:
-
detect the speech uttered by the speaker and generating corresponding audio signals; process the audio signals by an audio signal processor, other than a beamformer; generate a scene image with a camera; analyze the scene image, so as to estimate a distance between the speaker and the microphone; and automatically altar an operating parameter of the audio signal processor, based at least in part on the estimated distance between the speaker and the microphone.
-
Specification