APPARATUS AND METHOD FOR PROVIDING AN INFORMED MULTICHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION
First Claim
1. An apparatus for providing a speech probability estimation, comprising:
- a first speech probability estimator for estimating speech probability information indicating a first probability on whether a sound field of a scene comprises speech or on whether the sound field of the scene does not comprise speech, andan output interface for outputting the speech probability estimation depending on the speech probability information,wherein the first speech probability estimator is configured to estimate the first speech probability information based on at least spatial information about the sound field or spatial information on the scene.
1 Assignment
0 Petitions
Accused Products
Abstract
An apparatus for providing a speech probability estimation is provided. The apparatus includes a first speech probability estimator for estimating speech probability information indicating a first probability on whether a sound field of a scene includes speech or on whether the sound field of the scene does not include speech. Moreover, the apparatus includes an output interface for outputting the speech probability estimation depending on the speech probability information. The first speech probability estimator is configured to estimate the first speech probability information based on at least spatial information about the sound field or spatial information on the scene.
-
Citations
20 Claims
-
1. An apparatus for providing a speech probability estimation, comprising:
-
a first speech probability estimator for estimating speech probability information indicating a first probability on whether a sound field of a scene comprises speech or on whether the sound field of the scene does not comprise speech, and an output interface for outputting the speech probability estimation depending on the speech probability information, wherein the first speech probability estimator is configured to estimate the first speech probability information based on at least spatial information about the sound field or spatial information on the scene.
-
-
2. An apparatus according to claim 1,
wherein the apparatus furthermore comprises a second speech probability estimator for estimating the speech probability estimation indicating a second probability on whether the sound field comprises speech or on whether the sound field does not comprise speech, wherein the second speech probability estimator is configured to estimate the speech probability estimation based on the speech probability information estimated by the first speech probability estimator, and based on one or more acoustic sensor signals, which depend on the sound field.
-
3. An apparatus according to claim 1,
wherein the first speech probability estimator is configured to estimate the speech probability information based on directionality information, wherein the directionality information indicates how directional sound of the sound field is, wherein the first speech probability estimator is configured to estimate the speech probability information based on location information, wherein the location information indicates at least one location of a sound source of the scene, or wherein the first speech probability estimator is configured to estimate the speech probability information based on proximity information, wherein the proximity information indicates at least one proximity of at least one possible sound object to at least one proximity sensor.
-
4. An apparatus according to claim 1, wherein the first speech probability estimator is configured to estimate the speech probability estimation by determining a direct-to-diffuse ratio estimation of a direct-to-diffuse ratio as the spatial information, the direct-to-diffuse ratio indicating a ratio of direct sound comprised by the acoustic sensor signals to diffuse sound comprised by the acoustic sensor signals.
-
5. An apparatus according to claim 4,
wherein the first speech probability estimator is configured to determine the direct-to-diffuse ratio estimation by determining a coherence estimation of a complex coherence between a first acoustic signal of the acoustic sensor signals, the first acoustic signal being recorded by a first acoustic sensor p, and a second acoustic signal of the acoustic sensor signals, the second acoustic signal being recorded by a second acoustic sensor q, and wherein the first speech probability estimator is moreover configured to determine the direct-to-diffuse ratio based on a phase shift estimation of a phase shift of the direct sound between the first acoustic signal and the second acoustic signal.
-
6. An apparatus according to claim 5,
wherein the first speech probability estimator is configured to determine the direct-to-diffuse ratio estimation {circumflex over (Γ - )}(k, n) between the first acoustic signal and the second acoustic signal by applying the formula;
- )}(k, n) between the first acoustic signal and the second acoustic signal by applying the formula;
-
7. An apparatus according to claim 4, wherein the first speech probability estimator is configured to estimate the speech probability information by determining
ƒ- [{circumflex over (Γ
)}(k, n)]wherein {circumflex over (Γ
)}(k, n) is the direct-to-diffuse ratio estimation, andwherein ƒ
[{circumflex over (Γ
)}(k, n)] is a mapping function representing a mapping of the direct-to-diffuse ratio estimation to a value between 0 and 1.
- [{circumflex over (Γ
-
8. An apparatus according to claim 7, wherein the mapping function ƒ
- [{circumflex over (Γ
)}(k, n)] is defined by the formula;
- [{circumflex over (Γ
-
9. An apparatus according to claim 1, wherein the first speech probability estimator is configured to determine a location parameter Pb based on a probability distribution of an estimated location of a sound source and based on an area of interest to acquire the speech probability information.
-
10. An apparatus according to claim 9, wherein the first speech probability estimator is configured to determine the location parameter Pb by employing the formula
-
11. An apparatus according to claim 4,
wherein the first speech probability estimator is configured to determine the a priori speech presence probability q(k, n) as the speech probability information by applying the formula:
-
12. An apparatus according to claim 1, wherein the first speech probability estimator is configured to determine a proximity parameter as the spatial information,
wherein the proximity parameter comprises a first parameter value, when the first speech probability estimator detects one or more possible sound sources within a predefined distance from a proximity sensor, and wherein the proximity parameter comprises a second parameter value, being smaller than the first parameter value, when the first speech probability estimator does not detect possible sound sources in the direct proximity of the proximity sensor, and wherein the first speech probability estimator is configured to determine a first speech probability value as the speech probability information when the proximity parameter comprises the first parameter value, and wherein the first speech probability estimator is configured to determine a second speech probability value as the speech probability information when the proximity parameter comprises the second parameter value, the first speech probability value indicating a first probability that the sound field comprises speech, wherein the first probability is greater than a second probability that the sound field comprises speech, the second probability being indicated by the second speech probability value.
-
13. An apparatus for determining a noise power spectral density estimation, comprising:
-
a first speech probability estimator for estimating speech probability information indicating a first probability on whether a sound field of a scene comprises speech or on whether the sound field of the scene does not comprise speech, and an output interface for outputting the speech probability estimation depending on the speech probability information, wherein the first speech probability estimator is configured to estimate the first speech probability information based on at least spatial information about the sound field or spatial information on the scene, and a noise power spectral density estimation unit, wherein the apparatus is configured to provide the speech probability estimation to the noise power spectral density estimation unit, and wherein the noise power spectral density estimation unity is configured to determine the noise power spectral density estimation based on the speech probability estimation and a plurality of input audio channels.
-
-
14. An apparatus according to claim 13,
wherein the apparatus is configured to compute one or more spatial parameters, the one or more spatial parameters indicating spatial information about the sound field, wherein the apparatus is configured to compute the speech probability estimation by employing the one or more spatial parameters, and wherein the noise power spectral density estimation unit is configured to determine the noise power spectral density estimation by updating a previous noise power spectral density matrix depending on the speech probability estimation to acquire an updated noise power spectral density matrix as the noise power spectral density estimation.
-
15. An apparatus for estimating a steering vector, comprising:
-
a first speech probability estimator for estimating speech probability information indicating a first probability on whether a sound field of a scene comprises speech or on whether the sound field of the scene does not comprise speech, and an output interface for outputting the speech probability estimation depending on the speech probability information, wherein the first speech probability estimator is configured to estimate the first speech probability information based on at least spatial information about the sound field or spatial information on the scene, and a steering vector estimation unit, wherein the apparatus is configured to provide the speech probability estimation to the steering vector estimation unit, and wherein the steering vector estimation unit is configured to estimate the steering vector based on the speech probability estimation and a plurality of input audio channels.
-
-
16. An apparatus for multichannel noise reduction, comprising:
-
a first speech probability estimator for estimating speech probability information indicating a first probability on whether a sound field of a scene comprises speech or on whether the sound field of the scene does not comprise speech, and an output interface for outputting the speech probability estimation depending on the speech probability information, wherein the first speech probability estimator is configured to estimate the first speech probability information based on at least spatial information about the sound field or spatial information on the scene, and a filter unit, wherein the filter unit is configured to receive a plurality of audio input channels, wherein the apparatus is configured to provide the speech probability information to the filter unit, and wherein the filter unit is configured to filter the plurality of audio input channels to acquire filtered audio channels based on the speech probability information.
-
-
17. An apparatus according to claim 16, wherein the first speech probability estimator is configured to generate a tradeoff parameter, wherein the tradeoff parameter depends on at least one spatial parameter indicating spatial information about the sound field or spatial information on the scene.
-
18. An apparatus according to claim 17, wherein the filter unit is configured to filter the plurality of audio input channels depending on the tradeoff parameter.
-
19. A method for providing a speech probability estimation, comprising:
-
estimating speech probability information indicating a first probability on whether a sound field comprises speech or on whether the sound field does not comprise speech, and outputting the speech probability estimation depending on the speech probability information, wherein estimating the first speech probability information is based on at least spatial information about the sound field or spatial information on the scene.
-
-
20. A computer program for implementing the method of claim 19 when being executed on a computer or signal processor.
Specification