Detecting self-generated wake expressions

US 10,720,155 B2
Filed: 07/17/2017
Issued: 07/21/2020
Est. Priority Date: 06/27/2013
Status: Active Grant

First Claim

Patent Images

1. An audio device comprising:

one or more processors;

a microphone;

an audio speaker configured to produce output audio; and

memory storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising;

receiving, from the microphone, an audio signal generated by the microphone to represent input audio received at the microphone;

determining a confidence level that the audio signal includes a predefined expression;

determining that the confidence level is greater than a predetermined threshold;

generating a parameter that indicates at least one of;

whether the output audio is currently being produced by the audio speaker, whether the output audio contains speech, whether the output audio contains the predefined expression, loudness of the output audio, loudness of the input audio, or an echo characteristic of the audio signal; and

determining, based at least in part on the parameter and the confidence level being greater than the predetermined threshold, that an occurrence of the predefined expression in the input audio is a result of the output audio produced by the audio speaker.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech-based audio device may be configured to detect a user-uttered wake expression. For example, the audio device may generate a parameter indicating whether output audio is currently being produced by an audio speaker, whether the output audio contains speech, whether the output audio contains a predefined expression, loudness of the output audio, loudness of input audio, and/or an echo characteristic. Based on the parameter, the audio device may determine whether an occurrence of the predefined expression in the input audio is a result of an utterance of the predefined expression by a user.

60 Citations

20 Claims

1. An audio device comprising:
- one or more processors;
  
  a microphone;
  
  an audio speaker configured to produce output audio; and
  
  memory storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising;
  
  receiving, from the microphone, an audio signal generated by the microphone to represent input audio received at the microphone;
  
  determining a confidence level that the audio signal includes a predefined expression;
  
  determining that the confidence level is greater than a predetermined threshold;
  
  generating a parameter that indicates at least one of;
  
  whether the output audio is currently being produced by the audio speaker, whether the output audio contains speech, whether the output audio contains the predefined expression, loudness of the output audio, loudness of the input audio, or an echo characteristic of the audio signal; and
  
  determining, based at least in part on the parameter and the confidence level being greater than the predetermined threshold, that an occurrence of the predefined expression in the input audio is a result of the output audio produced by the audio speaker.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The audio device of claim 1, wherein the computer-executable instructions further comprise instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising:
    - in response to the determining that the occurrence of the predefined expression in the input audio is the result of the output audio produced by the speaker, refraining from receiving other input audio that is received after the input audio represented by the audio signal.
  - 3. The audio device of claim 1, wherein the computer-executable instructions further comprise instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising:
    - determining that a text-to-speech component of the audio device is not currently generating speech data representing the predefined expression;
      
      wherein the generating the parameter comprises generating the parameter that indicates that the output audio contains the predefined expression.
  - 4. The audio device of claim 1, wherein the parameter indicates at least one of the loudness of the output audio or the loudness of the input audio, the computer-executable instructions further comprising instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising at least one of:
    - determining that the loudness of the output audio is above a threshold;
      
      ordetermining that the loudness of the input audio is below the threshold or another threshold;
      
      wherein the determining that the occurrence of the predefined expression in the input audio is the result of the output audio produced by the audio speaker is based on at least one of the determining that the loudness of the output audio is above the threshold or the determining that the loudness of the input audio is below the threshold or the other threshold.
  - 5. The audio device of claim 1, wherein the parameter indicates an amount of echo reduction that has been applied to the audio signal, the computer-executable instructions further comprising instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising:
    - determining that the amount of echo reduction that has been applied to the audio signal is less than a threshold amount;
      
      wherein the determining that the occurrence of the predefined expression in the input audio is the result of the output audio produced by the audio speaker is based at least in part on the determining that the amount of echo reduction that has been applied to the audio signal is less than the threshold amount.

6. An audio device comprising:
- one or more processors;
  
  a microphone;
  
  an audio speaker configured to produce output audio; and
  
  memory storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising;
  
  receiving, from the microphone, an audio signal generated by the microphone to represent an input audio received at the microphone;
  
  generating a parameter that indicates at least one of;
  
  whether the output audio is currently being produced by the audio speaker, whether the output audio contains speech, whether the output audio contains a predefined expression, loudness of the output audio, loudness of the input audio, or an echo characteristic of the audio signal; and
  
  determining, based at least in part on the parameter, that an occurrence of the predefined expression in the input audio is a result of the predefined expression occurring in the output audio from the audio speaker.
- View Dependent Claims (7, 8, 9, 10, 11, 12)
- - 7. The audio device of claim 6, wherein the computer-executable instructions further comprise instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising:
    - determining an amplitude of an audio signal representing the output audio;
      
      wherein the generating the parameter comprises, based at least in part on the amplitude of the audio signal representing the output audio, generating the parameter that indicates the loudness of the output audio; and
      
      wherein the determining that the occurrence of the predefined expression in the input audio is the result of the predefined expression occurring in the output audio from the audio speaker includes determining that the loudness of the output audio is above a threshold.
  - 8. The audio device of claim 6, wherein the computer-executable instructions further comprise instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising:
    - determining an amplitude of the audio signal;
      
      wherein the generating the parameter comprises, based at least in part on the amplitude of the audio signal, generating the parameter that indicates the loudness of the input audio.
  - 9. The audio device of claim 6, wherein the microphone is a first microphone, the audio signal is a first audio signal, and the parameter is a first parameter that indicates the loudness of the input audio associated with the first audio signal, the computer-executable instructions further comprising instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising:
    - receiving, from a second microphone, a second audio signal representing input audio; and
      
      generating a second parameter that indicates loudness of the input audio associated with the second audio signal;
      
      wherein the determining includes, based at least in part on the first parameter and the second parameter, determining that the loudness of the input audio associated with the first audio signal is above a threshold and that the loudness of the input audio associated with the second audio signal is above the threshold.
  - 10. The audio device of claim 6, wherein the computer-executable instructions further comprise instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising:
    - determining an amount of echo reduction that has been applied to the audio signal;
      
      wherein the generating the parameter comprises generating the parameter that indicates the amount of echo reduction that has been applied to the audio signal; and
      
      wherein the determining that the occurrence of the predefined expression in the input audio is the result of the predefined expression occurring in the output audio from the audio speaker includes determining that the amount of echo reduction that has been applied to the audio signal is less than a threshold amount.
  - 11. The audio device of claim 6, wherein the computer-executable instructions further comprise instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising:
    - based at least in part on determining that the occurrence of the predefined expression in the input audio is the result of the predefined expression occurring in the output audio from the audio speaker, refraining from interpreting other input audio that is received after the input audio represented by the audio signal.
  - 12. The audio device of claim 6, wherein the computer-executable instructions further comprise instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising:
    - determining that the parameter matches or is within a predetermined tolerance to a reference parameter for a reference signal, the reference signal being associated with a known occurrence of the predefined expression in the output audio;
      
      wherein the determining that the occurrence of the predefined expression in the input audio is the result of the predefined expression occurring in the output audio from the audio speaker is based at least in part on the determining that the parameter matches or is within the predetermined tolerance to the reference parameter.

13. A method comprising:
- receiving, by an audio device, an audio signal generated by a microphone and representing input audio received at the microphone;
  
  generating, by the audio device, a parameter that indicates at least one of;
  
  whether output audio is currently being produced by an audio speaker, whether the output audio contains speech, whether the output audio contains a predefined expression, loudness of the output audio, loudness of the input audio, or an echo characteristic of the audio signal; and
  
  evaluating, by the audio device, the parameter to distinguish between utterance of the predefined expression by a user and production of the predefined expression by the audio speaker.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
- - 14. The method of claim 13, further comprising:
    - determining an amplitude of an audio signal representing the output audio;
      
      wherein the generating the parameter comprises, based at least in part on the amplitude of the audio signal representing the output audio, generating the parameter that indicates the loudness of the output audio; and
      
      wherein the evaluating includes determining whether the loudness of the output audio is above a threshold.
  - 15. The method of claim 13, further comprising:
    - determining an amplitude of the audio signal;
      
      wherein the generating the parameter comprises, based at least in part on the amplitude of the audio signal, generating the parameter that indicates the loudness of the input audio;
      
      wherein the evaluating includes determining whether the loudness of the input audio is above a threshold.
  - 16. The method of claim 13, wherein the parameter indicates at least one of an amount of echo present in the audio signal or an amount of echo reduction that has been applied to the audio signal.
  - 17. The method of claim 13, further comprising:
    - identifying a plurality of directional audio signals corresponding to a plurality of directions, respectively;
      
      identifying a number of the plurality of directional audio signals that include the predefined expression; and
      
      determining that the number of the plurality of directional audio signals that include the predefined expression is more than a threshold number;
      
      wherein the evaluating includes determining production of the predefined expression by the audio speaker based at least in part on determining that the number of the plurality of directional audio signals that include the predefined expression is more than the threshold number.
  - 18. The method of claim 17, wherein the determining that the number of the plurality of directional audio signals that include the predefined expression is more than the threshold number includes determining that the number of the plurality of directional audio signals that include the predefined expression is more than half of the plurality of directional audio signals.
  - 19. The method of claim 13, further comprising:
    - identifying a plurality of directional audio signals corresponding to a plurality of directions, respectively;
      
      identifying a number of the plurality of directional audio signals that include the predefined expression; and
      
      determining that the number of the plurality of directional audio signals that include the predefined expression is less than a threshold number;
      
      wherein the evaluating includes determining utterance of the predefined expression by the user based at least in part on determining that the number of the plurality of directional audio signals that include the predefined expression is less than the threshold number.
  - 20. The method of claim 19, further comprising:
    - performing speech recognition with the input audio to detect that the input audio contains the predefined expression;
      
      wherein the evaluating includes determining that occurrence of the predefined expression in the input audio is a result of an utterance of the predefined expression by the user.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Pogue, Michael Alan, Hilmes, Philip Ryan
Primary Examiner(s)
Jackson, Jakieda R

Application Number

US15/652,019
Publication Number

US 20180130468A1
Time in Patent Office

1,100 Days
Field of Search

704275, 704226
US Class Current
CPC Class Codes

G10L 15/22   Procedures used during a sp...

G10L 2015/088   Word spotting

G10L 2021/02087   the noise being separate sp...

G10L 2021/02166   Microphone arrays; Beamforming

Detecting self-generated wake expressions

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

60 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Detecting self-generated wake expressions

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

60 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links