×

Mitigating effects of electronic audio sources in expression detection

  • US 9,734,845 B1
  • Filed: 06/26/2015
  • Issued: 08/15/2017
  • Est. Priority Date: 06/26/2015
  • Status: Active Grant
First Claim
Patent Images

1. A system comprising:

  • a microphone array configured to produce microphone audio signals;

    an audio beamformer configured to process the microphone audio signals to produce directional audio signals, wherein a first directional audio signal of the directional audio signals corresponds to a first direction with respect to the microphone array and wherein a second directional audio signal of the directional audio signals corresponds to a second direction with respect to the microphone array, wherein the first directional audio signal and the second directional audio signal emphasize sound from the first direction and the second direction, respectively;

    a speech activity detector configured to analyze one or more frequency characteristics of the first directional audio signal and the second directional audio signal to determine a first level of speech presence and a second level of speech presence occurring in the first direction and the second direction, respectively, over time;

    a source detector configured to analyze the first level of speech presence and the second level of speech presence occurring over a past time period to determine that an electronic source of sound is located in the first direction or the second direction; and

    an expression detector configured to perform actions comprising;

    identifying the first direction where a first occurring level of speech presence is a highest level of speech presence;

    determining that the first direction corresponds to a direction in which the electronic source of sound is located;

    identifying the second direction where a second occurring level of speech presence is a second highest level of speech presence;

    analyzing the first directional audio signal corresponding to the first direction to produce a first score indicating a first likelihood that a trigger expression is represented in the first directional audio signal;

    analyzing the second directional audio signal corresponding to the second direction to produce a second score indicating a second likelihood that the trigger expression is represented in the second directional audio signal;

    comparing the first score to a first threshold;

    comparing the second score to a second threshold, wherein the second threshold is less than the first threshold;

    determining that (i) the first score is greater than the first threshold or (ii) the second score is greater than the second threshold;

    concluding that the trigger expression has been uttered; and

    performing speech recognition on subsequent speech, based at least in part on the trigger expression.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×