Detecting Self-Generated Wake Expressions

US 20150006176A1
Filed: 06/27/2013
Published: 01/01/2015
Est. Priority Date: 06/27/2013
Status: Active Grant

First Claim

Patent Images

1. An audio device configured to respond to a trigger expression uttered by a user, comprising:

a speaker configured to generate output audio;

a microphone array configured to produce a plurality of input audio signals;

an audio beamformer configured to produce a plurality of directional audio signals based at least in part on the input audio signals, wherein the directional audio signals represent audio from respectively corresponding directions relative to the audio device;

one or more speech recognition components configured to detect whether the predefined expression occurs in the audio represented by each of the respective directional audio signals; and

an expression detector configured to (a) determine that the trigger expression has been uttered by the user if the trigger expression occurs in the audio represented by less than a threshold number of the directional audio signals; and

(b) determine that the predefined expression has been not been generated by the speaker if the predefined expression occurs in the audio represented by all of the directional audio signals.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech-based audio device may be configured to detect a user-uttered wake expression and to respond by interpreting subsequent words or phrases as commands. In order to distinguish between utterance of the wake expression by the user and generation of the wake expression by the device itself, directional audio signals may by analyzed to detect whether the wake expression has been received from multiple directions. If the wake expression has been received from many directions, it is declared as being generated by the audio device and ignored. Otherwise, if the wake expression is received from a single direction or a limited number of directions, the wake expression is declared as being uttered by the user and subsequent words or phrase are interpreted and acted upon by the audio device.

Citations

20 Claims

1. An audio device configured to respond to a trigger expression uttered by a user, comprising:
- a speaker configured to generate output audio;
  
  a microphone array configured to produce a plurality of input audio signals;
  
  an audio beamformer configured to produce a plurality of directional audio signals based at least in part on the input audio signals, wherein the directional audio signals represent audio from respectively corresponding directions relative to the audio device;
  
  one or more speech recognition components configured to detect whether the predefined expression occurs in the audio represented by each of the respective directional audio signals; and
  
  an expression detector configured to (a) determine that the trigger expression has been uttered by the user if the trigger expression occurs in the audio represented by less than a threshold number of the directional audio signals; and
  
  (b) determine that the predefined expression has been not been generated by the speaker if the predefined expression occurs in the audio represented by all of the directional audio signals.
- View Dependent Claims (2, 3, 4)
- - 2. The audio device of claim 1, wherein the expression detector is further configured to determine that the trigger expression has been uttered by the user if the trigger expression occurs in the audio from multiple directions that are within a single cone shape extending from an apex at the audio device.
  - 3. The audio device of claim 1, wherein the expression detector is further configured to determine that the predefined expression has been uttered by the user if the predefined expression occurs in the audio from multiple directions that are within two cone shapes extending from apexes at the audio device.
  - 4. The audio device of claim 1, wherein the expression detector is further configured to determine that the predefined expression has been generated by the speaker if the predefined expression occurs in the audio represented by more than half of the directional audio signals.

5. A method comprising:
- producing output audio in a user environment;
  
  receiving a plurality of audio signals representing input audio from respectively corresponding portions of the user environment;
  
  generating one or more recognition parameters indicating which one or more of the directional audio signals contain a predefined expression; and
  
  determining that an occurrence of the predefined expression in the input audio is a result of the predefined expression occurring in the output audio based at least in part on the one or more recognition parameters.
- View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 6. The method of claim 5, wherein the determining comprises:
    - determining whether the one or more recognition parameters indicate that all of the audio signals represent input audio containing the predefined expression; and
      
      determining that the occurrence of the predefined expression in the input audio is a result of the predefined expression occurring in the output audio if the one or more recognition parameters indicate that all of the input audio signals represent input audio containing the predefined expression.
  - 7. The method of claim 5, wherein the determining comprises:
    - identifying a number of the audio signals that represent input audio containing the predefined expression based at least in part on the recognition parameters; and
      
      determining that the occurrence of the predefined expression in the input audio is a result of the predefined expression occurring in the output audio if the number exceeds a threshold.
  - 8. The method of claim 5, wherein:
    - the recognition parameters comprise individual parameters corresponding respectively to the audio signals;
      
      each individual parameter indicates whether the corresponding audio signal represents input audio containing the predefined expression;
      
      the determining further comprises identifying, based at least in part on the individual parameters, a number of the audio signals that represent input audio containing the predefined expression; and
      
      the determining further comprises determining that the occurrence of the predefined expression in the input audio is a result of the predefined expression occurring in the output audio if the number exceeds a threshold.
  - 9. The method of claim 5, wherein the determining comprises:
    - identifying an observed signal set, wherein the observed signal set has one or more members comprising one or more of the audio signals that are indicated by the one or more recognition parameters to represent input audio containing the predefined expression;
      
      determining that the occurrence of the predefined expression in the input audio is a result of the predefined expression occurring in the output audio if the observed signal set and a reference signal set have the same one or more members; and
      
      wherein the reference signal set has one or more members comprising one or more of the audio signals that contain the predefined expression during an occurrence of the predefined expression in the output audio.
  - 10. The method of claim 9, further comprising identifying the one or more members of the reference signal set during a known occurrence of the predefined expression in the output audio, wherein the one or more members of the reference signal set comprise one or more of the audio signals that are indicated by the one or more recognition parameters to represent input audio containing the predefined expression during the known occurrence of the predefined expression in the output audio.
  - 11. The method of claim 5, wherein the one or more recognition parameters indicate one or more of the following:
    - loudness of the output audio;
      
      whether the output audio is known to contain speech;
      
      loudness of the input audio;
      
      orecho characteristics of the audio signals.
  - 12. The method of claim 11, further comprising using machine learning to perform the determining.
  - 13. The method of claim 5, wherein the one or more recognition parameters correspond respectively to the directional audio signals, and wherein each of the one or more recognition parameters indicates whether the predefined expression is present in the corresponding audio signal.
  - 14. The method of claim 5, wherein the one or more recognition parameters correspond respectively to the audio signals, and wherein each of the one or more recognition parameters indicates a probability of the predefined expression being present in the corresponding audio signal.

15. An audio device comprising:
- one or more processors;
  
  memory storing computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform acts comprising;
  
  receiving a plurality of audio signals representing input audio from respectively corresponding portions of a user environment;
  
  evaluating the audio signals to generate indications corresponding respectively to the audio signals, wherein each indication indicates whether the input audio represented by the corresponding audio signal contains a predefined expression; and
  
  evaluating the indications to distinguish between utterance of the predefined expression by a user and production of the predefined expression by an audio speaker based at least in part on which one or more of the audio signals represent input audio containing the predefined expression.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The audio device of claim 15, wherein each of the indications comprises a binary parameter indicating whether the predefined expression occurs in the input audio represented by the corresponding audio signal.
  - 17. The audio device of claim 15, wherein each of the indications comprises a probability that the predefined expression occurs in the input audio represented by the corresponding audio signal.
  - 18. The audio device of claim 15, the acts further comprising:
    - identifying an observed signal set, wherein the observed signal set has one or more members comprising one or more of the audio signals that represent input audio containing the predefined expression;
      
      wherein the evaluating comprises determining that an occurrence of the predefined expression in the input audio is a result of the predefined expression being produced by the audio speaker if the observed signal set and a reference signal set have the same one or more members; and
      
      wherein the one or more members of the reference signal set comprise one or more of the audio signals that contain the predefined expression during production of the predefined expression by the audio speaker.
  - 19. The audio device of claim 18, the acts further comprising identifying the one or more members of the reference signal set during known production of the predefined expression by the audio speaker, wherein the one or members of the reference signal set comprise one or more of the audio signals that are indicated by the indications to contain the predefined expression during the known production of the predefined expression by the output speaker.
  - 20. The one audio device of claim 15, the acts further comprising generating a wake event indicating a probability of whether the predefined expression has been uttered by the user.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Rawles, LLC (Amazon.com, Inc.)
Inventors
Pogue, Michael Alan, Hilmes, Philip Ryan

Granted Patent

US 9,747,899 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/249
CPC Class Codes

G10L 15/22   Procedures used during a sp...

G10L 2015/088   Word spotting

G10L 2021/02087   the noise being separate sp...

G10L 2021/02166   Microphone arrays; Beamforming

Detecting Self-Generated Wake Expressions

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Detecting Self-Generated Wake Expressions

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links