Altering audio to improve automatic speech recognition
First Claim
1. An apparatus comprising;
- a speaker;
a microphone;
one or more processors; and
computer-readable media storing computer-executable instructions that, when executed on the one or more processors, cause the one or more processors to;
cause the speaker of the apparatus to output a first signal for a first period of time;
detect a first representation of a predefined word within a first input audio signal, the first input audio signal representative of sound captured by the microphone of the apparatus, and wherein the predefined word indicates that a voice command is going to be provided;
generate, based at least in part on detecting the first representation of the predefined word, a second signal that is different than the first signal;
cause the speaker of the apparatus to output the second signal for a second period of time, the second period of time being after the first period of time;
generate a second input audio signal during at least a portion of the second period of time; and
detect a second representation of the voice command within the second input audio signal during at least a portion of the second period of time.
2 Assignments
0 Petitions
Accused Products
Abstract
Techniques for altering audio being output by a voice-controlled device, or another device, to enable more accurate automatic speech recognition (ASR) by the voice-controlled device. For instance, a voice-controlled device may output audio within an environment using a speaker of the device. While outputting the audio, a microphone of the device may capture sound within the environment and may generate an audio signal based on the captured sound. The device may then analyze the audio signal to identify speech of a user within the signal, with the speech indicating that the user is going to provide a subsequent command to the device. Thereafter, the device may alter the output of the audio (e.g., attenuate the audio, pause the audio, switch from stereo to mono, etc.) to facilitate speech recognition of the user'"'"'s subsequent command.
13 Citations
19 Claims
-
1. An apparatus comprising;
-
a speaker; a microphone; one or more processors; and computer-readable media storing computer-executable instructions that, when executed on the one or more processors, cause the one or more processors to; cause the speaker of the apparatus to output a first signal for a first period of time; detect a first representation of a predefined word within a first input audio signal, the first input audio signal representative of sound captured by the microphone of the apparatus, and wherein the predefined word indicates that a voice command is going to be provided; generate, based at least in part on detecting the first representation of the predefined word, a second signal that is different than the first signal; cause the speaker of the apparatus to output the second signal for a second period of time, the second period of time being after the first period of time; generate a second input audio signal during at least a portion of the second period of time; and detect a second representation of the voice command within the second input audio signal during at least a portion of the second period of time. - View Dependent Claims (2, 3)
-
-
4. An apparatus comprising;
-
a speaker; a microphone unit to generate an input signal from sound; one or more processors; and computer-readable media storing computer-executable instructions that, when executed on the one or more processors, cause the one or more processors to; detect a first representation of a wakeup word within an input audio signal, the input audio signal representative of sound captured by the microphone unit; determine, based at least in part on detecting the first representation of the wakeup word, that a voice command is to be received; prevent the speaker of the apparatus from outputting an audio signal for a first period of time based at least in part on the determination that the voice command is to be received; determine the voice command from a second input signal during at least a portion of the first period of time; and cause the speaker of the apparatus to output the audio signal for a second period of time that is after the first period of time. - View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method comprising:
-
generating a first signal; causing a speaker of an apparatus to output the first signal for a first period of time; identifying from an input signal, received by the apparatus, an indication that a user is going to speak a voice command; and based at least in part on receiving the indication; generating a second signal for a second period of time; ceasing to cause the speaker of the apparatus to output the first signal; causing the speaker of the apparatus to output the second signal for the second period of time; detecting a first representation of a voice command from the input signal during at least a portion of the second period of time, the input signal representative of the sound captured by a microphone of the apparatus; and ceasing to cause the speaker of the apparatus to output the second signal. - View Dependent Claims (14, 15, 16, 17, 18, 19)
-
Specification