Altering audio to improve automatic speech recognition
First Claim
1. An apparatus comprising;
- a speaker to output audio in an environment;
a microphone unit to capture sound from the environment;
a processor; and
computer-readable media storing computer-executable instructions that, when executed on the processor, cause the processor to perform acts comprising;
receiving an audio signal generated by the microphone unit, the microphone unit having generated the audio signal based at least in part on the sound captured by the microphone unit, wherein the sound includes an utterance from a user in the environment, the utterance indicating that the user is going to provide a subsequent request to the apparatus;
identifying one or more characteristics associated with at least one of the audio signal or the audio being output by the speaker, the one or more characteristics at least including information indicative of a distance between the user and the apparatus;
determining, based at least in part on the one or more characteristics, an amount to attenuate the audio being output by the speaker to facilitate processing of the subsequent request, the amount to attenuate the audio increasing with increasing distance between the user and the apparatus; and
attenuating the audio being output by the speaker by the determined amount.
2 Assignments
0 Petitions
Accused Products
Abstract
Techniques for altering audio being output by a voice-controlled device, or another device, to enable more accurate automatic speech recognition (ASR) by the voice-controlled device. For instance, a voice-controlled device may output audio within an environment using a speaker of the device. While outputting the audio, a microphone of the device may capture sound within the environment and may generate an audio signal based on the captured sound. The device may then analyze the audio signal to identify speech of a user within the signal, with the speech indicating that the user is going to provide a subsequent command to the device. Thereafter, the device may alter the output of the audio (e.g., attenuate the audio, pause the audio, switch from stereo to mono, etc.) to facilitate speech recognition of the user'"'"'s subsequent command.
100 Citations
14 Claims
-
1. An apparatus comprising;
-
a speaker to output audio in an environment; a microphone unit to capture sound from the environment; a processor; and computer-readable media storing computer-executable instructions that, when executed on the processor, cause the processor to perform acts comprising; receiving an audio signal generated by the microphone unit, the microphone unit having generated the audio signal based at least in part on the sound captured by the microphone unit, wherein the sound includes an utterance from a user in the environment, the utterance indicating that the user is going to provide a subsequent request to the apparatus; identifying one or more characteristics associated with at least one of the audio signal or the audio being output by the speaker, the one or more characteristics at least including information indicative of a distance between the user and the apparatus; determining, based at least in part on the one or more characteristics, an amount to attenuate the audio being output by the speaker to facilitate processing of the subsequent request, the amount to attenuate the audio increasing with increasing distance between the user and the apparatus; and attenuating the audio being output by the speaker by the determined amount. - View Dependent Claims (2, 3, 4)
-
-
5. A computer-implemented method comprising:
-
receiving, while a speaker of a device outputs audio, an indication that a user is going to provide a subsequent voice command to the device; determining a distance between the user and the device; determining an amount to attenuate the audio output by the speaker of the device, the amount to attenuate the audio increasing with increasing distance between the user and the device; and attenuating the audio by the determined amount to increase an accuracy of speech recognition performed on an audio signal that includes the subsequent voice command by increasing a signal-to-noise ratio of the audio signal. - View Dependent Claims (6, 7, 8, 9, 10, 11)
-
-
12. A method comprising:
under control of an electronic device that includes a microphone, a speaker and executable instructions, outputting audio via the speaker; determining that a user is going to provide a voice command to the device based at least in part on an utterance from the user represented in an audio signal generated by the microphone; determining a distance between the user and the electronic device; determining an amount to attenuate the audio, the amount to attenuate the audio increasing with increasing distance between the user and the electronic device; and attenuating the audio by the determined amount to increase an accuracy of speech recognition performed on an audio signal that includes the subsequent voice command by increasing a signal-to-noise ratio of the audio signal. - View Dependent Claims (13, 14)
Specification