Detecting Self-Generated Wake Expressions
First Claim
1. An audio device configured to respond to a trigger expression uttered by a user, comprising:
- a speaker configured to generate output audio;
a microphone array configured to produce a plurality of input audio signals;
an audio beamformer configured to produce a plurality of directional audio signals based at least in part on the input audio signals, wherein the directional audio signals represent audio from respectively corresponding directions relative to the audio device;
one or more speech recognition components configured to detect whether the predefined expression occurs in the audio represented by each of the respective directional audio signals; and
an expression detector configured to (a) determine that the trigger expression has been uttered by the user if the trigger expression occurs in the audio represented by less than a threshold number of the directional audio signals; and
(b) determine that the predefined expression has been not been generated by the speaker if the predefined expression occurs in the audio represented by all of the directional audio signals.
2 Assignments
0 Petitions
Accused Products
Abstract
A speech-based audio device may be configured to detect a user-uttered wake expression and to respond by interpreting subsequent words or phrases as commands. In order to distinguish between utterance of the wake expression by the user and generation of the wake expression by the device itself, directional audio signals may by analyzed to detect whether the wake expression has been received from multiple directions. If the wake expression has been received from many directions, it is declared as being generated by the audio device and ignored. Otherwise, if the wake expression is received from a single direction or a limited number of directions, the wake expression is declared as being uttered by the user and subsequent words or phrase are interpreted and acted upon by the audio device.
-
Citations
20 Claims
-
1. An audio device configured to respond to a trigger expression uttered by a user, comprising:
-
a speaker configured to generate output audio; a microphone array configured to produce a plurality of input audio signals; an audio beamformer configured to produce a plurality of directional audio signals based at least in part on the input audio signals, wherein the directional audio signals represent audio from respectively corresponding directions relative to the audio device; one or more speech recognition components configured to detect whether the predefined expression occurs in the audio represented by each of the respective directional audio signals; and an expression detector configured to (a) determine that the trigger expression has been uttered by the user if the trigger expression occurs in the audio represented by less than a threshold number of the directional audio signals; and
(b) determine that the predefined expression has been not been generated by the speaker if the predefined expression occurs in the audio represented by all of the directional audio signals. - View Dependent Claims (2, 3, 4)
-
-
5. A method comprising:
-
producing output audio in a user environment; receiving a plurality of audio signals representing input audio from respectively corresponding portions of the user environment; generating one or more recognition parameters indicating which one or more of the directional audio signals contain a predefined expression; and determining that an occurrence of the predefined expression in the input audio is a result of the predefined expression occurring in the output audio based at least in part on the one or more recognition parameters. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. An audio device comprising:
-
one or more processors; memory storing computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform acts comprising; receiving a plurality of audio signals representing input audio from respectively corresponding portions of a user environment; evaluating the audio signals to generate indications corresponding respectively to the audio signals, wherein each indication indicates whether the input audio represented by the corresponding audio signal contains a predefined expression; and evaluating the indications to distinguish between utterance of the predefined expression by a user and production of the predefined expression by an audio speaker based at least in part on which one or more of the audio signals represent input audio containing the predefined expression. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification