Audio signal transmission techniques
First Claim
1. A method comprising:
- under control of one or more computing devices configured with executable instructions,obtaining a first audio signal from a first audio transducer and a second audio signal from a second audio transducer, wherein the first audio transducer and the second audio transducer are included in an audio transducer array located within an environment;
determining a time offset between the first audio signal and the second audio signal based at least in part on a time-difference-of-arrival between the first audio signal and the second audio signal;
identifying, based at least in part on the time offset, at least one difference between at least a portion of the first audio signal and at least a portion of the second audio signal; and
sending, to one or more computing resources that are remote from the environment, the first audio signal, information indicative of the at least one difference between the first audio signal and the second audio signal, and an indication of the time offset between the first audio signal and the second audio signal, the one or more computing resources that are remote from the environment being configured to;
(i) construct a representation of the second audio signal using the first audio signal, the information indicative of the at least one difference between the first audio signal and the second audio signal, and the indication of the time offset between the first audio signal and the second audio signal, and (ii) perform automatic speech recognition (ASR) to identify a voice command present in at least one of the first audio signal or the representation of the second audio signal.
2 Assignments
0 Petitions
Accused Products
Abstract
A voice interaction architecture that compiles multiple audio signals captured at different locations within an environment, determines a time offset between a primary audio signal and other captured audio signals and identifies differences between the primary signal and the other signal(s). Thereafter, the architecture may provide the primary audio signal, an indication of the determined time offset(s) and the identified differences to remote computing resources for further processing. For instance, the architecture may send this information to a network-accessible distributed computing platform that performs beamforming and/or automatic speech recognition (ASR) on the received audio. The distributed computing platform may in turn determine a response to provide based upon the beamforming and/or ASR.
-
Citations
30 Claims
-
1. A method comprising:
under control of one or more computing devices configured with executable instructions, obtaining a first audio signal from a first audio transducer and a second audio signal from a second audio transducer, wherein the first audio transducer and the second audio transducer are included in an audio transducer array located within an environment; determining a time offset between the first audio signal and the second audio signal based at least in part on a time-difference-of-arrival between the first audio signal and the second audio signal; identifying, based at least in part on the time offset, at least one difference between at least a portion of the first audio signal and at least a portion of the second audio signal; and sending, to one or more computing resources that are remote from the environment, the first audio signal, information indicative of the at least one difference between the first audio signal and the second audio signal, and an indication of the time offset between the first audio signal and the second audio signal, the one or more computing resources that are remote from the environment being configured to;
(i) construct a representation of the second audio signal using the first audio signal, the information indicative of the at least one difference between the first audio signal and the second audio signal, and the indication of the time offset between the first audio signal and the second audio signal, and (ii) perform automatic speech recognition (ASR) to identify a voice command present in at least one of the first audio signal or the representation of the second audio signal.- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
13. One or more non-transitory computer-readable media storing computer-executable instructions that, when executed on one or more processors, cause the one or more processors to perform acts comprising:
-
receiving during a time interval a first signal based on audio within an environment; receiving during the time interval a second signal based on the audio within the environment; identifying difference information between the first signal and the second signal; and sending the first signal and the difference information to one or more computing resources that are remote from the environment, the one or more computing resources that are remote from the environment configured to;
(i) construct a representation of the second signal using the first signal and the difference information, and (ii) perform automatic speech recognition (ASR) using at least one of the first signal or the representation of the second signal to identify a voice command uttered within the environment. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. One or more computing resources comprising:
-
one or more processors; and one or more computer-readable media storing computer-executable instructions that, when executed on the one or more processors, cause the one or more processors to perform acts comprising; receiving a first signal based at least in part on audio captured by a first audio transducer of an audio transducer array located within an environment; receiving information indicative of a difference between the first signal and a second signal based at least in part on the audio as captured by a second audio transducer of the audio transducer array; constructing a representation of the second signal based at least in part on the received first signal and the received information indicative of the difference between the first signal and the second signal; and performing, using at least one of the first signal or the constructed representation of the second signal, automatic speech recognition (ASR) to identify a voice command uttered within the environment. - View Dependent Claims (26, 27, 28, 29, 30)
-
Specification