Automatic volume attenuation for speech enabled devices

US 9,324,322 B1
Filed: 06/18/2013
Issued: 04/26/2016
Est. Priority Date: 06/18/2013
Status: Active Grant

First Claim

Patent Images

1. A method of modifying operation of a device, the method comprising:

generating an audio output via a speaker at a first volume level;

receiving a first audio input via a microphone, the first audio input including a first sound and a first portion of the audio output;

performing echo cancellation on the first audio input to remove the first portion of the audio output received by the microphone to generate an isolated audio input signal;

determining that the isolated audio input signal comprises the first sound by comparing the isolated audio input signal with at least one stored acoustic model, wherein the first sound does not comprise speech directed to the device;

in response to determining that the isolated audio input signal comprises the first sound;

reducing the audio output to a second volume level that is less than the first volume level; and

suppressing at least one notification that would otherwise have been produced;

storing the at least one notification;

receiving a voice command; and

in response to receiving the voice command;

restoring the audio output to the first volume level;

stopping the suppressing of further notifications; and

producing the at least one notification that was previously suppressed and stored.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition system that also automatically recognizes and acts in response to significant audio interruptions. Received audio is compared with stored acoustic signatures of noises which may trigger a change in device operation, such as pausing, loudening or attenuating of content playback after hearing a certain audio interruption, such as a doorbell, etc. If the received audio matches a stored acoustic model, the system alters an operational state of one or more devices, which may or may not include itself.

196 Citations

22 Claims

1. A method of modifying operation of a device, the method comprising:
- generating an audio output via a speaker at a first volume level;
  
  receiving a first audio input via a microphone, the first audio input including a first sound and a first portion of the audio output;
  
  performing echo cancellation on the first audio input to remove the first portion of the audio output received by the microphone to generate an isolated audio input signal;
  
  determining that the isolated audio input signal comprises the first sound by comparing the isolated audio input signal with at least one stored acoustic model, wherein the first sound does not comprise speech directed to the device;
  
  in response to determining that the isolated audio input signal comprises the first sound;
  
  reducing the audio output to a second volume level that is less than the first volume level; and
  
  suppressing at least one notification that would otherwise have been produced;
  
  storing the at least one notification;
  
  receiving a voice command; and
  
  in response to receiving the voice command;
  
  restoring the audio output to the first volume level;
  
  stopping the suppressing of further notifications; and
  
  producing the at least one notification that was previously suppressed and stored.
- View Dependent Claims (2, 3)
- - 2. The method of claim 1, further comprising:
    - receiving a second audio input via the microphone, the second audio input including a second sound and a second portion of the audio output;
      
      performing echo cancellation on the second audio input to remove the second portion of the audio output received by the microphone to generate a second isolated audio input signal;
      
      generating a new acoustic model for the second sound from the second isolated audio input signal, wherein the second sound does not comprise speech directed to the device; and
      
      adding the new acoustic model to the at least one stored acoustic model.
  - 3. The method of claim 1, wherein the echo cancellation comprises beamforming.

4. A computing device, comprising:
- at least one processor;
  
  a memory including instructions operable to be executed by the at least one processor to perform a set of actions, configuring the at least one processor;
  
  to generate an audio output at a first volume level;
  
  to receive a first audio input as an audio signal;
  
  to identify a presence of a first audio interruption by comparing the audio signal with one or more stored models; and
  
  to alter the audio output of the computing device, to store at least one notification, and to suppress the at least one notification, in response to identifying the presence of the first audio interruption;
  
  to recognize a voice command; and
  
  in response to recognizing the voice command;
  
  to restore the audio output to the first volume level;
  
  to end the suppressing of further notifications; and
  
  to deliver the at least one notification that was previously suppressed and stored.
- View Dependent Claims (5, 6, 7, 8, 9, 10, 18, 19, 20)
- - 5. The computing device of claim 4, wherein the at least one processor is further configured:
    - to receive a second audio input;
      
      to generate a new acoustic model for a second audio interruption from the second audio input; and
      
      to add the new acoustic model to the one or more stored models.
  - 6. The computing device of claim 4, wherein the first audio interruption is one or more of a doorbell, door knock, telephone ring, or voice of a non-user.
  - 7. The computing device of claim 4, wherein the at least one processor is configured to alter the audio output by pausing the audio output or adjusting a volume of the audio output from the first volume level to a second volume level.
  - 8. The computing device of claim 7, wherein:
    - the audio output comprises audio from streamed or stored media, andthe at least one processor is configured to adjust the volume of the audio output by increasing the volume of the audio output based on a volume of the first audio interruption.
  - 9. The computing device of claim 4, wherein the altering of the audio output is based at least in part on a type of the first audio interruption.
  - 10. The computing device of claim 4, wherein the altering of the audio output is based at least in part on a type of the audio output.
  - 18. The computing device of claim 4, wherein:
    - the first audio interruption is a conversation,the one or more stored models includes one or more text-independent voice prints or models, andthe instructions to identify the presence of the first audio interruption further configure the at least one processor;
      
      to compare the audio signal with the one or more text-independent voice prints or models to perform speaker recognition;
      
      to determine a number of persons speaking based on the speaker recognition; and
      
      to identify the presence of the conversation in response to determining that at least two persons are speaking.
  - 19. The computing device of claim 18, the instructions to identify the presence of the first audio interruption further configuring the at least one processor:
    - to determine a direction of each of the persons speaking relative to microphones that receive the first audio input by performing beamforming,wherein determination of the number of persons speaking is further based on the directions determined by the beamforming.
  - 20. The computing device of claim 4, wherein:
    - the first audio interruption is detection of an unrecognized voice,the one or more stored models includes one or more text-independent voice prints or models, andthe instructions to identify the presence of the first audio interruption further configure the at least one processor;
      
      to compare the audio signal with the one or more text-independent voice prints or models to perform speaker recognition;
      
      to determine that the audio signal includes a person speaking based on the speaker recognition;
      
      to determine that the voice of the person speaking does not match any of the text independent voice prints or models corresponding to a known voice; and
      
      to identify the presence of the unrecognized voice as the first audio interruption in response to determining that the voice of the person speaking is not a known.

11. A non-transitory computer-readable storage medium storing processor-executable instructions for controlling a computing device, comprising:
- program code to generate an audio output at a first volume level;
  
  program code to receive a first audio input as an audio signal;
  
  program code to identify a presence of a first audio interruption by comparing the audio signal with one or more stored models; and
  
  program code to alter the audio output of the computing device, to store at least one notification, and to suppress the at least one notification, in response to identifying the presence of the first audio interruption;
  
  program code to recognize a voice command; and
  
  program code to, in response to recognizing the voice command;
  
  restore the audio output to the first volume level;
  
  end the suppressing of further notifications; and
  
  deliver the at least one notification that was previously suppressed and stored.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 21, 22)
- - 12. The non-transitory computer-readable storage medium of claim 11, further comprising:
    - program code to receive a second audio input;
      
      program code to generate a new acoustic model for a second audio interruption from the second audio input; and
      
      program code to add the new acoustic model to the one or more stored models.
  - 13. The non-transitory computer-readable storage medium of claim 11, wherein the first audio interruption is one or more of a doorbell, door knock, telephone ring, or voice of a non-user.
  - 14. The non-transitory computer-readable storage medium of claim 11, further comprising program code to alter the audio output by pausing the audio output or adjusting a volume of the audio output from the first volume level to a second volume level.
  - 15. The non-transitory computer-readable storage medium of claim 14, wherein the program code to generate the audio output is configured to generate output comprising audio from streamed or stored media, the storage medium further comprising program code to adjust the volume of the audio output by increasing the volume of the audio output based on a volume of the first audio interruption.
  - 16. The non-transitory computer-readable storage medium of claim 11, wherein the program code to alter the audio output is based at least in part on a type of the first audio interruption.
  - 17. The non-transitory computer-readable storage medium of claim 11, wherein the program code to alter the audio output is based at least in part on a type of the audio output.
  - 21. The non-transitory computer-readable storage medium of claim 11, wherein:
    - the first audio interruption is a conversation,the one or more stored models includes one or more text-independent voice prints or models, andthe program code to identify the presence of the first audio interruption comprises;
      
      program code to compare the audio signal with the one or more text-independent voice prints or models to perform speaker recognition;
      
      program code to determine a number of persons speaking based on the speaker recognition; and
      
      program to identify the presence of the conversation in response to determining that at least two persons are speaking.
  - 22. The non-transitory computer-readable storage medium of claim 21, the program code to identify the presence of the first audio interruption further comprising:
    - program code to determine a direction of each of the persons speaking relative to microphones that receive the first audio input by performing beamforming,wherein determination of the number of persons speaking is further based on the directions determined by the beamforming.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Torok, Fred, Salvador, Stan Weidner
Primary Examiner(s)
Desir, Pierre-Louis
Assistant Examiner(s)
Tzeng, Forrest F

Application Number

US13/920,446
Time in Patent Office

1,043 Days
Field of Search

704/E21.002, 704/E21.004, 381/57, 381/72, 381/56
US Class Current

1/1
CPC Class Codes

G10L 15/22   Procedures used during a sp...

G10L 17/00   Speaker identification or v...

G10L 2015/223   Execution procedure of a sp...

G10L 25/51   for comparison or discrimin...

H03G 3/20   Automatic control H03G3/005...

H03G 3/342   Muting when some special ch...

H04M 9/082   using echo cancellers echo ...

H04R 2430/01   Aspects of volume control, ...

H04R 29/00   Monitoring arrangements; Te...

H04R 3/02   for preventing acoustic rea...

Automatic volume attenuation for speech enabled devices

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

196 Citations

22 Claims

Specification

Use Cases

Quick Links

Others

Automatic volume attenuation for speech enabled devices

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

196 Citations

22 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others