Speech signal enhancement using visual information
First Claim
1. A method, comprising:
- providing a microphone to detect speech uttered by a speaker and generate audio signals from the speech received by the microphone;
coupling an audio signal processor to the microphone to receive the audio signals and process the received audio signals;
providing a camera that can be at least partially orientated toward the microphone to generate a scene image;
coupling an image analyzer to the camera to automatically analyze the scene image for estimating a distance between the speaker and the microphone;
coupling a tuner to the image analyzer and to the audio signal processor to automatically alter an operating parameter of the audio signal processor, based at least in part on the estimated distance between the speaker and the microphone;
detecting a sound-reflecting surface disposed proximate the microphone and the speaker by analyzing the scene image, so as to estimate a ratio of;
sound energy reaching the microphone directly from the speaker andsound energy indirectly reaching the microphone from the speaker after being reflected from the sound-reflecting surface; and
altering the operating parameter of the audio signal processor, based at least in part on the estimated ratio,wherein the audio signal processor comprises an anti-reverberation filter and the tuner to reduce aggressiveness of the anti-reverberation filter when the estimated ratio is less than a predetermined value.
7 Assignments
0 Petitions
Accused Products
Abstract
Visual information is used to alter or set an operating parameter of an audio signal processor, other than a beamformer. A digital camera captures visual information about a scene that includes a human speaker and/or a listener. The visual information is analyzed to ascertain information about acoustics of a room. A distance between the speaker and a microphone may be estimated, and this distance estimate may be used to adjust an overall gain of the system. Distances among, and locations of, the speaker, the listener, the microphone, a loudspeaker and/or a sound-reflecting surface may be estimated. These estimates may be used to estimate reverberations within the room and adjust aggressiveness of an anti-reverberation filter, based on an estimated ratio of direct to indirect (reverberated) sound energy expected to reach the microphone. In addition, orientation of the speaker or the listener, relative to the microphone or the loudspeaker, can also be estimated, and this estimate may be used to adjust frequency-dependent filter weights to compensate for uneven frequency propagation of acoustic signals from a mouth, or to a human ear, about a human head.
-
Citations
17 Claims
-
1. A method, comprising:
-
providing a microphone to detect speech uttered by a speaker and generate audio signals from the speech received by the microphone; coupling an audio signal processor to the microphone to receive the audio signals and process the received audio signals; providing a camera that can be at least partially orientated toward the microphone to generate a scene image; coupling an image analyzer to the camera to automatically analyze the scene image for estimating a distance between the speaker and the microphone; coupling a tuner to the image analyzer and to the audio signal processor to automatically alter an operating parameter of the audio signal processor, based at least in part on the estimated distance between the speaker and the microphone; detecting a sound-reflecting surface disposed proximate the microphone and the speaker by analyzing the scene image, so as to estimate a ratio of; sound energy reaching the microphone directly from the speaker and sound energy indirectly reaching the microphone from the speaker after being reflected from the sound-reflecting surface; and altering the operating parameter of the audio signal processor, based at least in part on the estimated ratio, wherein the audio signal processor comprises an anti-reverberation filter and the tuner to reduce aggressiveness of the anti-reverberation filter when the estimated ratio is less than a predetermined value. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A method, comprising:
-
providing a microphone to detect speech uttered by a speaker and generate audio signals from the speech received by the microphone; coupling an audio signal processor to the microphone to receive the audio signals and process the received audio signals; providing a camera that can be at least partially orientated toward the microphone to generate a scene image; coupling an image analyzer to the camera to automatically analyze the scene image for estimating a distance between the speaker and the microphone; and coupling a tuner to the image analyzer and to the audio signal processor to automatically alter an operating parameter of the audio signal processor, based at least in part on the estimated distance between the speaker and the microphone; analyzing the scene image to estimate an orientation of the speaker, relative to the microphone; and altering the operating parameter of the audio signal processor, based at least in part on the estimated orientation of the speaker, wherein the operating parameter comprises a plurality of gains, wherein each of the plurality of gains is associated with a range of frequencies; and causing at least one of the plurality of gains, associated with a high range of frequencies (“
high-frequency gain”
), to be set, relative to another at least one of the plurality of gains, associated with a low range of frequencies (“
low-frequency gain”
), based on the estimated orientation of the speaker, such that when the speaker is oriented away from the microphone, the high-frequency gain is set higher, relative to the low-frequency gain, than when the speaker is oriented toward the microphone;detecting a sound-reflecting surface disposed proximate the microphone and the speaker by analyzing the scene image, so as to estimate a ratio of; sound energy reaching the microphone directly from the speaker and sound energy indirectly reaching the microphone from the speaker after being reflected from the sound-reflecting surface; and altering the operating parameter of the audio signal processor, based at least in part on the estimated ratio, wherein the audio signal processor comprises an anti-reverberation filter and the tuner to reduce aggressiveness of the anti-reverberation filter when the estimated ratio is less than a predetermined value.
-
-
16. An audio system for use by a plurality of speakers, the system comprising:
-
a microphone configured to detect speech uttered by at least one of the plurality of speakers and generate corresponding audio signals; an audio signal processor coupled to the microphone to receive the audio signals and configured to process the received audio signals; a camera orientable at least partially toward the microphone at and configured to generate a scene image; an image analyzer coupled to the camera and configured to automatically analyze the scene image, so as to detect a gesture by at least one of the speakers; and a tuner coupled to the image analyzer and to the audio signal processor to automatically alter an operating parameter of the audio signal processor, based at least in part on the estimated distance between the speaker and the microphone; wherein the system is configured to detect a sound-reflecting surface disposed proximate the microphone and the speaker by analyzing the scene image, so as to estimate a ratio of; sound energy reaching the microphone directly from the speaker and sound energy indirectly reaching the microphone from the speaker after being reflected from the sound-reflecting surface; and alter the operating parameter of the audio signal processor, based at least in part on the estimated ratio, wherein the audio signal processor comprises an anti-reverberation filter and the tuner to reduce aggressiveness of the anti-reverberation filter when the estimated ratio is less than a predetermined value.
-
-
17. A tangible non-transitory computer-readable storage medium with an executable program stored thereon for automatically processing speech uttered by a speaker into a microphone, wherein the program enables a machine to:
-
detect the speech uttered by the speaker and generating corresponding audio signals; process the audio signals by an audio signal processor, other than a beamformer; generate a scene image with a camera; analyze the scene image, so as to estimate a distance between the speaker and the microphone; and automatically alter an operating parameter of the audio signal processor, based at least in part on the estimated distance between the speaker and the microphone; detect a sound-reflecting surface disposed proximate the microphone and the speaker by analyzing the scene image, so as to estimate a ratio of; sound energy reaching the microphone directly from the speaker and sound energy indirectly reaching the microphone from the speaker after being reflected from the sound-reflecting surface; and alter the operating parameter of the audio signal processor, based at least in part on the estimated ratio, wherein the audio signal processor comprises an anti-reverberation filter and the tuner to reduce aggressiveness of the anti-reverberation filter when the estimated ratio is less than a predetermined value.
-
Specification