Hotword detection on multiple devices
First Claim
1. A computer-implemented method comprising:
- receiving, by one or more processors of a computing device, audio data that corresponds to an utterance;
determining that the utterance likely includes a particular, predefined hotword;
in response to determining that the utterance likely includes the particular, predefined hotword, determining score that reflects a loudness of the audio data;
determining a duration of a delay period, wherein the duration of the delay period is inversely proportional to the loudness of the audio data;
activating a mode in which the computing device temporarily listens, for the duration of the delay period, for a predetermined audio signal that indicates that another computing device is commencing speech recognition processing on the audio data;
after the duration of the delay period has elapsed without hearing the predetermined audio signal from another computing device, deactivating the mode and transmitting the predetermined audio signal that indicates that the computing device is commencing speech recognition processing on the audio data; and
after transmitting the predetermined audio signal, processing at least a portion of the audio data using an automated speech recognizer on the computing device.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving, by a computing device, audio data that corresponds to an utterance. The actions further include determining a likelihood that the utterance includes a hotword. The actions further include determining a loudness score for the audio data. The actions further include based on the loudness score, determining an amount of delay time. The actions further include, after the amount of delay time has elapsed, transmitting a signal that indicates that the computing device will initiate speech recognition processing on the audio data.
-
Citations
20 Claims
-
1. A computer-implemented method comprising:
-
receiving, by one or more processors of a computing device, audio data that corresponds to an utterance; determining that the utterance likely includes a particular, predefined hotword; in response to determining that the utterance likely includes the particular, predefined hotword, determining score that reflects a loudness of the audio data; determining a duration of a delay period, wherein the duration of the delay period is inversely proportional to the loudness of the audio data; activating a mode in which the computing device temporarily listens, for the duration of the delay period, for a predetermined audio signal that indicates that another computing device is commencing speech recognition processing on the audio data; after the duration of the delay period has elapsed without hearing the predetermined audio signal from another computing device, deactivating the mode and transmitting the predetermined audio signal that indicates that the computing device is commencing speech recognition processing on the audio data; and after transmitting the predetermined audio signal, processing at least a portion of the audio data using an automated speech recognizer on the computing device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system comprising:
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; receiving, by a computing device, audio data that corresponds to an utterance; determining that the utterance likely includes a particular, predefined hotword; in response to determining that the utterance likely includes the particular, predefined hotword, determining score that reflects a loudness of the audio data; determining a duration of a delay period, wherein the duration of the delay period is inversely proportional to the loudness of the audio data; activating a mode in which the computing device temporarily listens, for the duration of the delay period, for a predetermined audio signal that indicates that another computing device is commencing speech recognition processing on the audio data; after the duration of the delay period has elapsed without hearing the predetermined audio signal from another computing device, deactivating the mode and transmitting the predetermined audio signal that indicates that the computing device is commencing speech recognition processing on the audio data; and after transmitting the predetermined audio signal, processing at least a portion of the audio data using an automated speech recognizer on the computing device. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
16. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
-
receiving, by a computing device, audio data that corresponds to an utterance; determining that the utterance likely includes a particular, predefined hotword; in response to determining that the utterance likely includes the particular, predefined hotword, determining score that reflects a loudness of the audio data; determining a duration of a delay period, wherein the duration of the delay period is inversely proportional to the loudness of the audio data; activating a mode in which the computing device temporarily listens, for the duration of the delay period, for a predetermined audio signal that indicates that another computing device is commencing speech recognition processing on the audio data; after the duration of the delay period has elapsed without hearing the predetermined audio signal from another computing device, deactivating the mode and transmitting the predetermined audio signal that indicates that the computing device is commencing speech recognition processing on the audio data; and after transmitting the predetermined audio signal, processing at least a portion of the audio data using an automated speech recognizer on the computing device. - View Dependent Claims (17, 18, 19, 20)
-
Specification