Energy-based sound source localization and gain normalization
First Claim
Patent Images
1. A computer-implemented process for determining the location of one or more people speaking in a room captured by an ad hoc microphone network, comprising the process actions of:
- inputting audio streams of people speaking, each audio signal being captured with a microphone on a computing device; and
segmenting each audio stream to find the person closest to each microphone;
finding the average energy of the person closest to each microphone;
using the average energy of the person closest to each microphone, to compute the gain of each microphone;
using the average energy of the person closest to each microphone, computing the attenuation of each person'"'"'s speech when it reaches each microphone;
using the attenuation of each person'"'"'s speech to find the distance between each microphone; and
using the distance between each microphone to find the coordinates of each microphone and the person closest to each microphone, assuming that the person closest to each microphone is at the same location as the microphone.
2 Assignments
0 Petitions
Accused Products
Abstract
An energy based technique to estimate the positions of people speaking from an ad hoc network of microphones. The present technique does not require accurate synchronization of the microphones. In addition, a technique to normalize the gains of the microphones based on people'"'"'s speech is presented, which allows aggregation of various audio channels from the ad hoc microphone network into a single stream for audio conferencing. The technique is invariant of the speaker'"'"'s volumes thus making the system easy to deploy in practice.
-
Citations
20 Claims
-
1. A computer-implemented process for determining the location of one or more people speaking in a room captured by an ad hoc microphone network, comprising the process actions of:
-
inputting audio streams of people speaking, each audio signal being captured with a microphone on a computing device; and segmenting each audio stream to find the person closest to each microphone; finding the average energy of the person closest to each microphone; using the average energy of the person closest to each microphone, to compute the gain of each microphone; using the average energy of the person closest to each microphone, computing the attenuation of each person'"'"'s speech when it reaches each microphone; using the attenuation of each person'"'"'s speech to find the distance between each microphone; and using the distance between each microphone to find the coordinates of each microphone and the person closest to each microphone, assuming that the person closest to each microphone is at the same location as the microphone. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer-implemented process for determining and using the location of people speaking in a room captured by an ad hoc microphone network, comprising:
-
inputting audio streams of people speaking, each audio signal being captured with a microphone on a computing device; and segmenting each audio stream to find the person closest to each microphone; finding the average energy of the person closest to each microphone; using the average energy of the person closest to each microphone, to compute the gain of each microphone; using the average energy of the person closest to each microphone, computing the attenuation of each person'"'"'s speech when it reaches each microphone; using the attenuation of each person'"'"'s speech that is closest to each microphone to find the distance between each microphone; using the distance between each microphone to find the coordinates of each microphone and the person closest to each microphone assuming the microphone and the person closest to it are co-located; computing an average energy ratio, the ratio of the average energy of the audio stream of a speaker that does not have a microphone to a first microphone over the average energy of the audio stream of the speaker that does not have a microphone to a second microphone; using the average energy ratio to compute an attenuation ratio, the ratio of the attenuation of the audio stream of the speaker that does not have a microphone to a first microphone over the attenuation of the audio stream of the speaker that does not have a microphone to a second microphone; using the attenuation ratio to find a distance ratio, the ratio of the distance of the speaker that does not have a microphone to a first microphone over the attenuation of the distance of the speaker that does not have a microphone to a second microphone; and using the distance ratio to find the coordinates of the speaker that does not have a microphone. - View Dependent Claims (11, 12, 13, 14)
-
-
15. A system for improving the audio and video quality of a recorded event, comprising:
-
a general purpose computing device; a computer program comprising program modules executable by the general purpose computing device, wherein the computing device is directed by the program modules of the computer program to, find one or more speakers'"'"' positions by using the average energy of a captured audio segment for each person speaking; and apply the one or more speakers'"'"' positions to improve the audio or video of a captured event. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification