Robust short-time fourier transform acoustic echo cancellation during audio playback

  • US 10,446,165 B2
  • Filed: 09/27/2017
  • Issued: 10/15/2019
  • Est. Priority Date: 09/27/2017
  • Status: Active Grant
  • ×
    • Pin
First Claim
Patent Images

1. A system comprising:

  • an audio stage comprising an audio processor and an audio amplifier;

    one or more speakers;

    one or more microphones;

    one or more processors;

    data storage storing instructions executable by the one or more processors that cause the system to perform operations comprising;

    causing, via the audio stage, the one or more speakers to play back audio content;

    while the audio content is playing back via the one or more speakers, capturing, via the one or more microphones, audio within an acoustic environment, wherein the captured audio comprises audio signals representing sound produced by the one or more speakers in playing back the audio content;

    receiving a playback signal from the audio stage representing the audio content being played back by the one or more speakers;

    transforming into a short time Fourier transform (STFT) domain the captured audio within the acoustic environment to generate a measured signal in the STFT domain comprising a series of frames representing the captured audio within the acoustic environment;

    transforming into the STFT domain the received output signal from the audio stage to generate a reference signal in the STFT domain comprising a series of frames representing the audio content being played back via the one or more speakers;

    during each nth iteration of an acoustic echo canceller (AEC);

    determining an nth frame of an output signal, wherein determining the nth frame of the output signal comprises;

    generating an nth frame of a model signal by passing an nth frame of the reference signal through an nth instance of an adaptive filter, wherein the first instance of the adaptive filter is an initial filter; and

    generating the nth frame of the output signal by redacting the nth frame of the model signal from an nth frame of the measured signal;

    determining a n+1th instance of the adaptive filter for a next iteration of the AEC, wherein determining the n+1th instance of the adaptive filter for the next iteration of the AEC comprises;

    determining an nth frame of an error signal, the nth frame of the error signal representing a difference between the nth frame of the model signal and the nth frame of the reference signal less audio signals representing sound from sources other than an nth frame of the audio signals representing sound produced by the one or more speakers in playing back the nth frame of the reference signal;

    determining a normalized least mean square (NMLS) of the nth frame of the error signal;

    determining a sparse NMLS of the nth frame of the error signal by applying to the NMLS of the nth frame of the error signal, a sparse partition criterion that zeroes out frequency bands of the NMLS having less than a threshold energy;

    converting the sparse NMLS of the nth frame of the error signal to an nth update filter; and

    generating the n+1th instance of the adaptive filter for the next iteration of the AEC by summing the nth instance of the adaptive filter with the nth update filter; and

    sending the output signal as a voice input to one or more voice services for processing of the voice input.

View all claims
    ×
    ×

    Thank you for your feedback

    ×
    ×