Sound source localization confidence estimation using machine learning
First Claim
1. A system to use machine learning to perform sound source localization confidence estimation, the system comprising:
- memory; and
one or more processors coupled to the memory and configured to;
perform a sound source localization (SSL) operation with regard to a sound to determine an SSL direction estimate, which indicates an estimated direction from which the sound is received, and an SSL-based confidence associated with the SSL direction estimate based at least in part on a multi-channel representation of the sound,the SSL-based confidence indicating an estimated probability that the sound is received from the estimated direction,the multi-channel representation including a plurality of representations of the sound that are detected by a plurality of respective sensors;
automatically determine one or more additional characteristics of the sound; and
perform a machine learning (ML) operation based at least in part on the SSL direction estimate, the SSL-based confidence, and the one or more additional characteristics to determine an ML-based confidence associated with the SSL direction estimate, wherein the machine learning operation is capable of incorporating arbitrary features associated with respective characteristics of the sound into a determination of the ML-based confidence on-the-fly without a manual modification of code associated with the machine learning operation to accommodate the arbitrary features.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques are described herein that are capable of performing sound source localization (SSL) confidence estimation using machine learning. An SSL operation is performed with regard to a sound to determine an SSL direction estimate and an SSL-based confidence associated with the SSL direction estimate based at least in part on a multi-channel representation of the sound. The SSL direction estimate indicates an estimated direction from which the sound is received. The SSL-based confidence indicates an estimated probability that the sound is received from the estimated direction. The multi-channel representation includes representations of the sound that are detected by respective sensors (e.g., microphones). Additional characteristic(s) of the sound are automatically determined. A machine learning (ML) operation is performed based at least in part on the SSL direction estimate, the SSL-based confidence, and the additional characteristic(s) to determine an ML-based confidence associated with the SSL direction estimate.
8 Citations
20 Claims
-
1. A system to use machine learning to perform sound source localization confidence estimation, the system comprising:
-
memory; and one or more processors coupled to the memory and configured to; perform a sound source localization (SSL) operation with regard to a sound to determine an SSL direction estimate, which indicates an estimated direction from which the sound is received, and an SSL-based confidence associated with the SSL direction estimate based at least in part on a multi-channel representation of the sound, the SSL-based confidence indicating an estimated probability that the sound is received from the estimated direction, the multi-channel representation including a plurality of representations of the sound that are detected by a plurality of respective sensors; automatically determine one or more additional characteristics of the sound; and perform a machine learning (ML) operation based at least in part on the SSL direction estimate, the SSL-based confidence, and the one or more additional characteristics to determine an ML-based confidence associated with the SSL direction estimate, wherein the machine learning operation is capable of incorporating arbitrary features associated with respective characteristics of the sound into a determination of the ML-based confidence on-the-fly without a manual modification of code associated with the machine learning operation to accommodate the arbitrary features. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method of using machine learning to perform sound source localization confidence estimation using at least one of (a) one or more processors, (b) hardware logic, or (c) electrical circuitry, the method comprising:
-
performing a sound source localization (SSL) operation with regard to a sound to determine an SSL direction estimate, which indicates an estimated direction from which the sound is received, and an SSL-based confidence associated with the SSL direction estimate based at least in part on a multi-channel representation of the sound, the SSL-based confidence indicating an estimated probability that the sound is received from the estimated direction, the multi-channel representation including a plurality of representations of the sound that are detected by a plurality of respective sensors; automatically determining one or more additional characteristics of the sound; and performing a machine learning (ML) operation based at least in part on the SSL direction estimate, the SSL-based confidence, and the one or more additional characteristics to determine an ML-based confidence associated with the SSL direction estimate and to generate a feature set of a machine learning model that is usable in a subsequent machine learning operation to determine an ML-based confidence inference based at least in part on a second SSL direction estimate that indicates a second estimated direction from which a second sound is received, a second SSL-based confidence that indicates an estimated probability that the second sound is received from the second estimated direction, and one or more additional characteristics of the second sound. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A system to use machine learning to perform sound source localization confidence estimation, the system comprising:
-
memory; and one or more processors coupled to the memory and configured to; perform a sound source localization (SSL) operation with regard to a sound to determine an SSL direction estimate, which indicates an estimated direction from which the sound is received, and an SSL-based confidence associated with the SSL direction estimate based at least in part on a multi-channel representation of the sound, the SSL-based confidence indicating an estimated probability that the sound is received from the estimated direction, the multi-channel representation including a plurality of representations of the sound that are detected by a plurality of respective sensors; automatically determine one or more additional characteristics of the sound; perform a machine learning (ML) operation based at least in part on the SSL direction estimate, the SSL-based confidence, and the one or more additional characteristics to determine an ML-based confidence associated with the SSL direction estimate; and determine an angle in which to point a video camera in accordance with a dynamic video zoom operation based at least in part on the ML-based confidence.
-
Specification