Acoustic echo cancellation based sub band domain active speaker detection for audio and video conferencing applications
First Claim
1. A method for detecting an active speaker in at least a two-way conference comprising:
- analyzing real time audio in one or more sub band domains according to an echo canceller model, wherein the echo canceller model includes at least in part processing the real time audio using an acoustic echo cancellation linear filter;
determining, based on the analyzed real time audio, one or more audio metrics;
weighting, via a trained machine learning model, the one or more audio metrics based on importance of the one or more audio metrics for active speaker determination in the one or more sub band domains;
summing the one or more weighted audio metrics;
comparing the one or more summed weighted audio metrics and a hysteresis threshold;
in response to the one or more summed weighted audio metrics being greater than the hysteresis threshold, determining a speaker status as active; and
in response to the speaker status being active, removing one or more of residual echo or noise from the real time audio based on the weighted one or more audio metrics.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems, methods, and devices are disclosed for detecting an active speaker in a two-way conference. Real time audio in one or more sub band domains are analyzed according to an echo cancellor model. Based on the analyzed real time audio, one or more audio metrics are determined from output from an acoustic echo cancellation linear filter. The one or more audio metrics are weighted based on a priority, and a speaker status is determined based on the weighted one or more audio metrics being analyzed according to an active speaker detection model. For an active speaker status, one or more residual echo or noise is removed from the real time audio based on the one or more audio metrics.
245 Citations
19 Claims
-
1. A method for detecting an active speaker in at least a two-way conference comprising:
-
analyzing real time audio in one or more sub band domains according to an echo canceller model, wherein the echo canceller model includes at least in part processing the real time audio using an acoustic echo cancellation linear filter; determining, based on the analyzed real time audio, one or more audio metrics; weighting, via a trained machine learning model, the one or more audio metrics based on importance of the one or more audio metrics for active speaker determination in the one or more sub band domains; summing the one or more weighted audio metrics; comparing the one or more summed weighted audio metrics and a hysteresis threshold; in response to the one or more summed weighted audio metrics being greater than the hysteresis threshold, determining a speaker status as active; and in response to the speaker status being active, removing one or more of residual echo or noise from the real time audio based on the weighted one or more audio metrics. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for detecting an active speaker in at least a two-way conference comprising:
-
at least one receiver for receiving real time audio; a conference server communicatively coupled to the at least one receiver, wherein the conference server is configured to; analyze the real time audio in one or more sub band domains according to an echo canceller model, wherein the echo canceller model includes at least in part processing the real time audio using an acoustic echo cancellation linear filter; determine, based on the analyzed real time audio, one or more audio metrics; weight, via a trained machine learning model, the one or more audio metrics based on importance of the one or more audio metrics for active speaker determination in the one or more sub band domains; sum the one or more weighted audio metrics; compare the one or more summed weighted audio metrics and a hysteresis threshold; in response to the one or more summed weighted audio metrics being greater than the hysteresis threshold, determine a speaker status as active; and in response to the speaker status being active, remove one or more of residual echo or noise from the real time audio based on the weighted one or more audio metrics; and a loudspeaker in communication with the conference server configured to output the real time audio. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A non-transitory computer-readable medium containing instructions that, when executed by a computing system, cause the computing system to:
-
analyze real time audio in one or more sub band domains according to an echo canceller model, wherein the echo canceller model includes at least in part processing the real time audio using an acoustic echo cancellation linear filter; determine, based on the analyzed real time audio, one or more audio metrics; weight, via a trained machine learning model, the one or more audio metrics based on importance of the one or more audio metrics for active speaker determination in the one or more sub band domains; sum the one or more weighted audio metrics; compare the one or more summed weighted audio metrics and a hysteresis threshold; in response to the one or more summed weighted audio metrics being greater than the hysteresis threshold, determine a speaker status is active; and in response to the speaker status being active, remove one or more of residual echo or noise from the real time audio based on the weighted one or more audio metrics. - View Dependent Claims (15, 16, 17, 18, 19)
-
Specification