Altering audio to improve automatic speech recognition

US 9,916,830 B1
Filed: 01/13/2016
Issued: 03/13/2018
Est. Priority Date: 09/26/2012
Status: Active Grant

First Claim

Patent Images

1. An apparatus comprising;

a speaker;

a microphone;

one or more processors; and

computer-readable media storing computer-executable instructions that, when executed on the one or more processors, cause the one or more processors to;

cause the speaker of the apparatus to output a first signal for a first period of time;

detect a first representation of a predefined word within a first input audio signal, the first input audio signal representative of sound captured by the microphone of the apparatus, and wherein the predefined word indicates that a voice command is going to be provided;

generate, based at least in part on detecting the first representation of the predefined word, a second signal that is different than the first signal;

cause the speaker of the apparatus to output the second signal for a second period of time, the second period of time being after the first period of time;

generate a second input audio signal during at least a portion of the second period of time; and

detect a second representation of the voice command within the second input audio signal during at least a portion of the second period of time.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques for altering audio being output by a voice-controlled device, or another device, to enable more accurate automatic speech recognition (ASR) by the voice-controlled device. For instance, a voice-controlled device may output audio within an environment using a speaker of the device. While outputting the audio, a microphone of the device may capture sound within the environment and may generate an audio signal based on the captured sound. The device may then analyze the audio signal to identify speech of a user within the signal, with the speech indicating that the user is going to provide a subsequent command to the device. Thereafter, the device may alter the output of the audio (e.g., attenuate the audio, pause the audio, switch from stereo to mono, etc.) to facilitate speech recognition of the user'"'"'s subsequent command.

13 Citations

19 Claims

1. An apparatus comprising;
- a speaker;
  
  a microphone;
  
  one or more processors; and
  
  computer-readable media storing computer-executable instructions that, when executed on the one or more processors, cause the one or more processors to;
  
  cause the speaker of the apparatus to output a first signal for a first period of time;
  
  detect a first representation of a predefined word within a first input audio signal, the first input audio signal representative of sound captured by the microphone of the apparatus, and wherein the predefined word indicates that a voice command is going to be provided;
  
  generate, based at least in part on detecting the first representation of the predefined word, a second signal that is different than the first signal;
  
  cause the speaker of the apparatus to output the second signal for a second period of time, the second period of time being after the first period of time;
  
  generate a second input audio signal during at least a portion of the second period of time; and
  
  detect a second representation of the voice command within the second input audio signal during at least a portion of the second period of time.
- View Dependent Claims (2, 3)
- - 2. The apparatus as recited in claim 1, wherein the second signal is a mono signal and the first signal is a stereo signal.
  - 3. The apparatus as recited in claim 1, wherein the instructions further cause the one or more processor to:
    - determine a completion of the second period of time; and
      
      cause, based at least in part on the completion of the second period of time, the speaker to output the first signal as sound.

4. An apparatus comprising;
- a speaker;
  
  a microphone unit to generate an input signal from sound;
  
  one or more processors; and
  
  computer-readable media storing computer-executable instructions that, when executed on the one or more processors, cause the one or more processors to;
  
  detect a first representation of a wakeup word within an input audio signal, the input audio signal representative of sound captured by the microphone unit;
  
  determine, based at least in part on detecting the first representation of the wakeup word, that a voice command is to be received;
  
  prevent the speaker of the apparatus from outputting an audio signal for a first period of time based at least in part on the determination that the voice command is to be received;
  
  determine the voice command from a second input signal during at least a portion of the first period of time; and
  
  cause the speaker of the apparatus to output the audio signal for a second period of time that is after the first period of time.
- View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12)
- - 5. The apparatus as recited in claim 4, further comprising a switch to decouple the speaker from a power source during the first period of time.
  - 6. The apparatus as recited in claim 4, wherein the instructions further cause the one or more processors to detect a first representation of a predefined word within the input signal prior to preventing the speaker from outputting the audio signal.
  - 7. The apparatus as recited in claim 6, wherein the first representation of the predefined word indicates to the apparatus that a user is going to speak the voice command.
  - 8. The apparatus as recited in claim 6, wherein the predefined word is related to the voice command.
  - 9. The apparatus as recited in claim 4, wherein the completion of the first period of time is based at least in part on a third period of time associated with determining the voice command.
  - 10. The apparatus as recited in claim 4, further comprising:
    - a second speaker; and
      
      wherein the instructions further cause the one or more processor to send the audio signal to the second speaker during the first period of time.
  - 11. The apparatus as recited in claim 4, wherein:
    - the audio signal is a stereo signal; and
      
      the instructions further cause the one or more processor to output the audio signal as a mono signal.
  - 12. The apparatus as recited in claim 4, further comprising:
    - a second speaker; and
      
      wherein the instructions further cause the one or more processor to prevent the second speaker from outputting the audio signal for the first period of time.

13. A method comprising:
- generating a first signal;
  
  causing a speaker of an apparatus to output the first signal for a first period of time;
  
  identifying from an input signal, received by the apparatus, an indication that a user is going to speak a voice command; and
  
  based at least in part on receiving the indication;
  
  generating a second signal for a second period of time;
  
  ceasing to cause the speaker of the apparatus to output the first signal;
  
  causing the speaker of the apparatus to output the second signal for the second period of time;
  
  detecting a first representation of a voice command from the input signal during at least a portion of the second period of time, the input signal representative of the sound captured by a microphone of the apparatus; and
  
  ceasing to cause the speaker of the apparatus to output the second signal.
- View Dependent Claims (14, 15, 16, 17, 18, 19)
- - 14. The method as recited in claim 13, wherein the generating the first signal further comprises generating a stereo signal.
  - 15. The method as recited in claim 13, wherein the generating the second signal further comprise generating a mono signal.
  - 16. The method as recited in claim 13, wherein the generating the second signal further comprise attenuating the first signal.
  - 17. The method as recited in claim 13, further comprising ceasing to send the first signal to the speaker.
  - 18. The method as recited in claim 13, further comprising determining the indication is a first representation of a predefined word or words.
  - 19. The method as recited in claim 13, further comprising causing a switch to decouple a second speaker from a power source during the period of time.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Hart, Gregory Michael, Worley, III, William Spencer
Primary Examiner(s)
Paul, Disler

Application Number

US14/994,926
Time in Patent Office

790 Days
Field of Search

381 56, 381 58, 381104-109, 381110, 704275
US Class Current
CPC Class Codes

G10L 15/20   Speech recognition techniqu...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 17/00   Speaker identification or v...

G10L 2015/223   Execution procedure of a sp...

G11B 27/005   Reproducing at a different ...

H03G 3/32   the control being dependent...

H03G 5/02   Manually-operated control

H04R 3/12   for distributing signals to...

Altering audio to improve automatic speech recognition

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

13 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Altering audio to improve automatic speech recognition

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

13 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links