Sound source localization confidence estimation using machine learning

US 10,649,060 B2
Filed: 07/24/2017
Issued: 05/12/2020
Est. Priority Date: 07/24/2017
Status: Active Grant

First Claim

Patent Images

1. A system to use machine learning to perform sound source localization confidence estimation, the system comprising:

memory; and

one or more processors coupled to the memory and configured to;

perform a sound source localization (SSL) operation with regard to a sound to determine an SSL direction estimate, which indicates an estimated direction from which the sound is received, and an SSL-based confidence associated with the SSL direction estimate based at least in part on a multi-channel representation of the sound,the SSL-based confidence indicating an estimated probability that the sound is received from the estimated direction,the multi-channel representation including a plurality of representations of the sound that are detected by a plurality of respective sensors;

automatically determine one or more additional characteristics of the sound; and

perform a machine learning (ML) operation based at least in part on the SSL direction estimate, the SSL-based confidence, and the one or more additional characteristics to determine an ML-based confidence associated with the SSL direction estimate, wherein the machine learning operation is capable of incorporating arbitrary features associated with respective characteristics of the sound into a determination of the ML-based confidence on-the-fly without a manual modification of code associated with the machine learning operation to accommodate the arbitrary features.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques are described herein that are capable of performing sound source localization (SSL) confidence estimation using machine learning. An SSL operation is performed with regard to a sound to determine an SSL direction estimate and an SSL-based confidence associated with the SSL direction estimate based at least in part on a multi-channel representation of the sound. The SSL direction estimate indicates an estimated direction from which the sound is received. The SSL-based confidence indicates an estimated probability that the sound is received from the estimated direction. The multi-channel representation includes representations of the sound that are detected by respective sensors (e.g., microphones). Additional characteristic(s) of the sound are automatically determined. A machine learning (ML) operation is performed based at least in part on the SSL direction estimate, the SSL-based confidence, and the additional characteristic(s) to determine an ML-based confidence associated with the SSL direction estimate.

8 Citations

View as Search Results

20 Claims

1. A system to use machine learning to perform sound source localization confidence estimation, the system comprising:
- memory; and
  
  one or more processors coupled to the memory and configured to;
  
  perform a sound source localization (SSL) operation with regard to a sound to determine an SSL direction estimate, which indicates an estimated direction from which the sound is received, and an SSL-based confidence associated with the SSL direction estimate based at least in part on a multi-channel representation of the sound,the SSL-based confidence indicating an estimated probability that the sound is received from the estimated direction,the multi-channel representation including a plurality of representations of the sound that are detected by a plurality of respective sensors;
  
  automatically determine one or more additional characteristics of the sound; and
  
  perform a machine learning (ML) operation based at least in part on the SSL direction estimate, the SSL-based confidence, and the one or more additional characteristics to determine an ML-based confidence associated with the SSL direction estimate, wherein the machine learning operation is capable of incorporating arbitrary features associated with respective characteristics of the sound into a determination of the ML-based confidence on-the-fly without a manual modification of code associated with the machine learning operation to accommodate the arbitrary features.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The system of claim 1, wherein the one or more processors are configured to automatically determine a probability that the sound is of a designated type;
    - andwherein the one or more processors are configured to perform the machine learning operation based at least in part on the SSL direction estimate, the SSL-based confidence, and the probability to determine the ML-based confidence associated with the SSL direction estimate.
  - 3. The system of claim 2, wherein the one or more processors are configured to classify the sound to be the designated type selected from a plurality of types based at least in part on the probability;
    - andwherein the one or more processors are configured to determine the ML-based confidence to be more accurate than the SSL-based confidence based at least in part on the sound being classified to be the designated type.
  - 4. The system of claim 3, wherein the one or more processors are configured to perform a frequency analysis operation with respect to the sound to determine a frequency response of the sound;
    - wherein the one or more processors are configured to determine whether the frequency response corresponds to the designated type; and
      
      wherein the one or more processors are configured to determine the ML-based confidence to be more accurate than the SSL-based confidence further based at least in part on a determination that the frequency response corresponds to the designated type.
  - 5. The system of claim 1, wherein the one or more processors are configured to perform an analysis of an environment in which the sound is produced;
    - wherein the one or more processors are configured to determine a characteristic of the environment in which the sound is produced based at least in part on the analysis; and
      
      wherein the one or more processors are configured to perform the machine learning operation based at least in part on the SSL direction estimate, the SSL-based confidence, and the characteristic of the environment to determine the ML-based confidence.
  - 6. The system of claim 5, wherein the one or more processors are configured to determine whether a volume of background noise in the environment is greater than or equal to a volume threshold;
    - andwherein the one or more processors are configured to determine the ML-based confidence to be less than the SSL-based confidence based at least in part on a determination that the volume of the background noise in the environment is greater than or equal to the volume threshold.
  - 7. The system of claim 5, wherein the one or more processors are configured to determine whether reverberance of the environment is greater than or equal to a reverberance threshold;
    - andwherein the one or more processors are configured to determine the ML-based confidence to be less than the SSL-based confidence based at least in part on a determination that the reverberance of the environment is greater than or equal to the reverberance threshold.
  - 8. The system of claim 1, wherein the one or more processors are configured to determine an angle in which to point a video camera in accordance with a dynamic video zoom operation based at least in part on the ML-based confidence.
  - 9. The system of claim 1, wherein the one or more processors are configured to change directionality of a beamformer steering operation associated with the plurality of sensors to correspond to the estimated direction from which the sound is received, as indicated by the SSL direction estimate, based at least in part on the ML-based confidence.
  - 10. The system of claim 1, wherein the one or more processors are configured to:
    - perform the machine learning operation based at least in part on the SSL direction estimate, the SSL-based confidence, and the one or more additional characteristics to generate a feature set of a machine learning model that is usable in a subsequent machine learning operation to determine an ML-based confidence inference based at least in part on a second SSL direction estimate that indicates a second estimated direction from which a second sound is received, a second SSL-based confidence that indicates an estimated probability that the second sound is received from the second estimated direction, and one or more additional characteristics of the second sound.

11. A method of using machine learning to perform sound source localization confidence estimation using at least one of (a) one or more processors, (b) hardware logic, or (c) electrical circuitry, the method comprising:
- performing a sound source localization (SSL) operation with regard to a sound to determine an SSL direction estimate, which indicates an estimated direction from which the sound is received, and an SSL-based confidence associated with the SSL direction estimate based at least in part on a multi-channel representation of the sound,the SSL-based confidence indicating an estimated probability that the sound is received from the estimated direction,the multi-channel representation including a plurality of representations of the sound that are detected by a plurality of respective sensors;
  
  automatically determining one or more additional characteristics of the sound; and
  
  performing a machine learning (ML) operation based at least in part on the SSL direction estimate, the SSL-based confidence, and the one or more additional characteristics to determine an ML-based confidence associated with the SSL direction estimate and to generate a feature set of a machine learning model that is usable in a subsequent machine learning operation to determine an ML-based confidence inference based at least in part on a second SSL direction estimate that indicates a second estimated direction from which a second sound is received, a second SSL-based confidence that indicates an estimated probability that the second sound is received from the second estimated direction, and one or more additional characteristics of the second sound.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
- - 12. The method of claim 11, wherein automatically determining the one or more additional characteristics comprises:
    - performing an analysis of an environment in which the sound is produced; and
      
      determining a characteristic of the environment in which the sound is produced based at least in part on the analysis; and
      
      wherein performing the machine learning operation comprises;
      
      performing the machine learning operation based at least in part on the SSL direction estimate, the SSL-based confidence, and the characteristic of the environment to determine the ML-based confidence.
  - 13. The method of claim 12, wherein determining the characteristic of the environment comprises:
    - determining that at least one of a volume of background noise in the environment or reverberance of the environment is less than or equal to a threshold; and
      
      wherein performing the machine learning operation comprises;
      
      determining the ML-based confidence to be greater than the SSL-based confidence based at least in part on a determination that the at least one of the volume of the background noise in the environment or the reverberance of the environment is less than or equal to the threshold.
  - 14. The method of claim 11, wherein the sound includes human voice;
    - wherein automatically determining the one or more additional characteristics comprises;
      
      determining that an attribute of the human voice causes detectability of the human voice to be compromised; and
      
      wherein performing the machine learning operation comprises;
      
      determining the ML-based confidence to be less than the SSL-based confidence based at least in part on a determination that the attribute of the human voice causes the detectability of the human voice to be compromised.
  - 15. The method of claim 11, wherein automatically determining the one or more additional characteristics comprises:
    - determining that a first sample of the sound, which is captured during a first time period, corresponds to the estimated direction; and
      
      determining that a second sample of the sound, which is captured during a second time period that follows the first time period, corresponds to a second direction that is different from the estimated direction to which the first sample of the sound corresponds; and
      
      wherein performing the machine learning operation comprises;
      
      determining the ML-based confidence to be more accurate than the SSL-based confidence based at least in part on the first sample of the sound corresponding to the estimated direction and further based at least in part on the first sample being captured before the second sample.
  - 16. The method of claim 11, wherein performing the sound source localization operation comprises:
    - determining a first weight to be applied to the SSL-based confidence;
      
      wherein automatically determining the one or more additional characteristics comprises;
      
      determining that the sound is received from a speaker of a device that performs the sound source localization operation; and
      
      wherein performing the machine learning operation comprises;
      
      determining a second weight, which is to be applied to the ML-based confidence, to be less than the first weight based at least in part on a determination that the sound is received from the speaker of the device.
  - 17. The method of claim 11, wherein performing the machine learning operation comprises:
    - performing the machine learning operation utilizing a feature set of a machine learning model based at least in part on the SSL direction estimate, the SSL-based confidence, and the one or more additional characteristics to determine an ML-based confidence inference associated with the SSL direction estimate.
  - 18. The method of claim 11, wherein automatically determining the one or more additional characteristics of the sound comprises:
    - automatically determining a probability that the sound is of a designated type; and
      
      classifying the sound to be the designated type selected from a plurality of types based at least in part on the probability; and
      
      wherein performing the machine learning operation comprises;
      
      performing the machine learning operation based at least in part on the SSL direction estimate, the SSL-based confidence, and the probability to determine the ML-based confidence to be more accurate than the SSL-based confidence based at least in part on the sound being classified to be the designated type.
  - 19. The method of claim 11, wherein performing the machine learning operation comprises:
    - performing the machine learning operation based at least in part on the SSL direction estimate, the SSL-based confidence, and the one or more additional characteristics further to determine an updated estimate, indicating a different estimated direction from which the sound is received.

20. A system to use machine learning to perform sound source localization confidence estimation, the system comprising:
- memory; and
  
  one or more processors coupled to the memory and configured to;
  
  perform a sound source localization (SSL) operation with regard to a sound to determine an SSL direction estimate, which indicates an estimated direction from which the sound is received, and an SSL-based confidence associated with the SSL direction estimate based at least in part on a multi-channel representation of the sound,the SSL-based confidence indicating an estimated probability that the sound is received from the estimated direction,the multi-channel representation including a plurality of representations of the sound that are detected by a plurality of respective sensors;
  
  automatically determine one or more additional characteristics of the sound;
  
  perform a machine learning (ML) operation based at least in part on the SSL direction estimate, the SSL-based confidence, and the one or more additional characteristics to determine an ML-based confidence associated with the SSL direction estimate; and
  
  determine an angle in which to point a video camera in accordance with a dynamic video zoom operation based at least in part on the ML-based confidence.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Venalainen, Kevin Juho
Primary Examiner(s)
Truong, Kenny H

Application Number

US15/657,475
Publication Number

US 20190025400A1
Time in Patent Office

1,023 Days
Field of Search

None
US Class Current
CPC Class Codes

G01S 3/8083   determining direction of so...

G06N 20/00   Machine learning

G06N 3/08   Learning methods

H04R 2430/20   Processing of the output si...

H04R 3/005   for combining the signals o...

Sound source localization confidence estimation using machine learning

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

8 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Sound source localization confidence estimation using machine learning

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

8 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links