Altering audio to improve automatic speech recognition

US 9,251,787 B1
Filed: 09/26/2012
Issued: 02/02/2016
Est. Priority Date: 09/26/2012
Status: Active Grant

First Claim

Patent Images

1. An apparatus comprising;

a speaker to output audio in an environment;

a microphone unit to capture sound from the environment;

a processor; and

computer-readable media storing computer-executable instructions that, when executed on the processor, cause the processor to perform acts comprising;

receiving an audio signal generated by the microphone unit, the microphone unit having generated the audio signal based at least in part on the sound captured by the microphone unit, wherein the sound includes an utterance from a user in the environment, the utterance indicating that the user is going to provide a subsequent request to the apparatus;

identifying one or more characteristics associated with at least one of the audio signal or the audio being output by the speaker, the one or more characteristics at least including information indicative of a distance between the user and the apparatus;

determining, based at least in part on the one or more characteristics, an amount to attenuate the audio being output by the speaker to facilitate processing of the subsequent request, the amount to attenuate the audio increasing with increasing distance between the user and the apparatus; and

attenuating the audio being output by the speaker by the determined amount.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques for altering audio being output by a voice-controlled device, or another device, to enable more accurate automatic speech recognition (ASR) by the voice-controlled device. For instance, a voice-controlled device may output audio within an environment using a speaker of the device. While outputting the audio, a microphone of the device may capture sound within the environment and may generate an audio signal based on the captured sound. The device may then analyze the audio signal to identify speech of a user within the signal, with the speech indicating that the user is going to provide a subsequent command to the device. Thereafter, the device may alter the output of the audio (e.g., attenuate the audio, pause the audio, switch from stereo to mono, etc.) to facilitate speech recognition of the user'"'"'s subsequent command.

100 Citations

View as Search Results

14 Claims

1. An apparatus comprising;
- a speaker to output audio in an environment;
  
  a microphone unit to capture sound from the environment;
  
  a processor; and
  
  computer-readable media storing computer-executable instructions that, when executed on the processor, cause the processor to perform acts comprising;
  
  receiving an audio signal generated by the microphone unit, the microphone unit having generated the audio signal based at least in part on the sound captured by the microphone unit, wherein the sound includes an utterance from a user in the environment, the utterance indicating that the user is going to provide a subsequent request to the apparatus;
  
  identifying one or more characteristics associated with at least one of the audio signal or the audio being output by the speaker, the one or more characteristics at least including information indicative of a distance between the user and the apparatus;
  
  determining, based at least in part on the one or more characteristics, an amount to attenuate the audio being output by the speaker to facilitate processing of the subsequent request, the amount to attenuate the audio increasing with increasing distance between the user and the apparatus; and
  
  attenuating the audio being output by the speaker by the determined amount.
- View Dependent Claims (2, 3, 4)
- - 2. An apparatus as recited in claim 1, wherein the one or more characteristics further comprise information indicative of whether the audio that is being output by the speaker comprises a song or an audio book.
  - 3. An apparatus as recited in claim 1, wherein the one or more characteristics further comprise at least one of:
    - information indicative of the identity of the user;
      
      orinformation indicative of a direction of the user relative to the apparatus.
  - 4. An apparatus as recited in claim 1, wherein:
    - the apparatus further comprises an additional speaker, the speaker and the additional speaker outputting the audio in stereo;
      
      the attenuating comprises, at least in part, altering a signal sent to the speaker and the additional speaker from a stereo signal to a mono signal.

5. A computer-implemented method comprising:
- receiving, while a speaker of a device outputs audio, an indication that a user is going to provide a subsequent voice command to the device;
  
  determining a distance between the user and the device;
  
  determining an amount to attenuate the audio output by the speaker of the device, the amount to attenuate the audio increasing with increasing distance between the user and the device; and
  
  attenuating the audio by the determined amount to increase an accuracy of speech recognition performed on an audio signal that includes the subsequent voice command by increasing a signal-to-noise ratio of the audio signal.
- View Dependent Claims (6, 7, 8, 9, 10, 11)
- - 6. The computer-implemented method as recited in claim 5, wherein the attenuating comprises, at least in part, altering a signal sent to the speaker from a stereo signal to a mono signal.
  - 7. The computer-implemented method as recited in claim 5, further comprising determining an identity of the user, and wherein the determining the amount to attenuate the audio is further based at least in part on the identity of the user.
  - 8. The computer-implemented method as recited in claim 5, further comprising identifying a class of content being output, and wherein the determining the amount to attenuate the audio is further based at least in part on the type of content being output.
  - 9. The computer-implemented method as recited in claim 5, further comprising determining a frequency range of the audio to attenuate.
  - 10. The computer-implemented method as recited in claim 9, further comprising determining an identity of the user, and wherein the determining the frequency range of the audio to attenuate is further based at least in part on the identity of the user.
  - 11. The computer-implemented method as recited in claim 5, the wherein the indication comprises the user speaking a predefined word or phrase.

12. A method comprising:
- under control of an electronic device that includes a microphone, a speaker and executable instructions,outputting audio via the speaker;
  
  determining that a user is going to provide a voice command to the device based at least in part on an utterance from the user represented in an audio signal generated by the microphone;
  
  determining a distance between the user and the electronic device;
  
  determining an amount to attenuate the audio, the amount to attenuate the audio increasing with increasing distance between the user and the electronic device; and
  
  attenuating the audio by the determined amount to increase an accuracy of speech recognition performed on an audio signal that includes the subsequent voice command by increasing a signal-to-noise ratio of the audio signal.
- View Dependent Claims (13, 14)
- - 13. A method as recited in claim 12, wherein the attenuating comprises, at least in part, switching a signal sent to the speaker from a stereo signal to a mono signal.
  - 14. A method as recited in claim 12, wherein the device includes the speaker and an additional speaker, and the attenuating comprises, at least in part, ceasing to send a signal to the speaker or the additional speaker.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Hart, Gregory M., Worley, William Spencer III
Primary Examiner(s)
Paul, Disler

Application Number

US13/627,890
Time in Patent Office

1,224 Days
Field of Search

381 56- 59, 381/110, 381104-109, 381 941- 949, 381/86, 704/273, 704275-276, 704/226, 704/233
US Class Current

1/1
CPC Class Codes

G10L 15/20   Speech recognition techniqu...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 17/00   Speaker identification or v...

G10L 2015/223   Execution procedure of a sp...

G11B 27/005   Reproducing at a different ...

H03G 3/32   the control being dependent...

H03G 5/02   Manually-operated control

H04R 3/12   for distributing signals to...

Altering audio to improve automatic speech recognition

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

100 Citations

14 Claims

Specification

Use Cases

Quick Links

Others

Altering audio to improve automatic speech recognition

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

100 Citations

14 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others