Methods and systems for detecting and processing speech signals
First Claim
1. A computer-implemented method comprising:
- receiving, by a first computing device, audio data that corresponds to an utterance;
processing the audio data using a hotword data module that is configured to detect a particular, predefined hotword;
based on processing the audio data using the hotword data module, generating a first hotword confidence score that reflects a likelihood that the audio data received by the first computing device includes the particular, predefined hotword;
receiving, from a second computing device, a second hotword confidence score that reflects a likelihood that the audio data received by the second computing device includes the particular, predefined hotword;
receiving, from a third computing device, a third hotword confidence score that reflects a likelihood that the audio data received by the third computing device includes the particular, predefined hotword;
comparing the first hotword confidence score, the second hotword confidence score, and the third hotword confidence score;
based on comparing the first hotword confidence score, the second hotword confidence score, and the third hotword confidence score;
determining, by the first computing device, to process additional audio that corresponds to a subsequent utterance; and
selecting, from among the second computing device and the third computing device, one or more computing devices to process the additional audio data that corresponds to the subsequent utterance; and
providing, to the selected one or more computing devices, an instruction to process the additional audio data that corresponds to the subsequent utterance.
2 Assignments
0 Petitions
Accused Products
Abstract
Provided are methods, systems, and apparatuses for detecting, processing, and responding to audio signals, including speech signals, within a designated area or space. A platform for multiple media devices connected via a network is configured to process speech, such as voice commands, detected at the media devices, and respond to the detected speech by causing the media devices to simultaneously perform one or more requested actions. The platform is capable of scoring the quality of a speech request, handling speech requests from multiple end points of the platform using a centralized processing approach, a de-centralized processing approach, or a combination thereof, and also manipulating partial processing of speech requests from multiple end points into a coherent whole when necessary.
170 Citations
20 Claims
-
1. A computer-implemented method comprising:
-
receiving, by a first computing device, audio data that corresponds to an utterance; processing the audio data using a hotword data module that is configured to detect a particular, predefined hotword; based on processing the audio data using the hotword data module, generating a first hotword confidence score that reflects a likelihood that the audio data received by the first computing device includes the particular, predefined hotword; receiving, from a second computing device, a second hotword confidence score that reflects a likelihood that the audio data received by the second computing device includes the particular, predefined hotword; receiving, from a third computing device, a third hotword confidence score that reflects a likelihood that the audio data received by the third computing device includes the particular, predefined hotword; comparing the first hotword confidence score, the second hotword confidence score, and the third hotword confidence score; based on comparing the first hotword confidence score, the second hotword confidence score, and the third hotword confidence score; determining, by the first computing device, to process additional audio that corresponds to a subsequent utterance; and selecting, from among the second computing device and the third computing device, one or more computing devices to process the additional audio data that corresponds to the subsequent utterance; and providing, to the selected one or more computing devices, an instruction to process the additional audio data that corresponds to the subsequent utterance. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system comprising:
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; receiving, by a first computing device, audio data that corresponds to an utterance; processing the audio data using a hotword data module that is configured to detect a particular, predefined hotword; based on processing the audio data using the hotword data module, generating a first hotword confidence score that reflects a likelihood that the audio data received by the first computing device includes the particular, predefined hotword; receiving, from a second computing device, a second hotword confidence score that reflects a likelihood that the audio data received by the second computing device includes the particular, predefined hotword; receiving, from a third computing device, a third hotword confidence score that reflects a likelihood that the audio data received by the third computing device includes the particular, predefined hotword; comparing the first hotword confidence score, the second hotword confidence score, and the third hotword confidence score; based on comparing the first hotword confidence score, the second hotword confidence score, and the third hotword confidence score; determining, by the first computing device, to process additional audio that corresponds to a subsequent utterance; and selecting, from among the second computing device and the third computing device, one or more computing devices to process the additional audio data that corresponds to the subsequent utterance; and providing, to the selected one or more computing devices, an instruction to process the additional audio data that corresponds to the subsequent utterance. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
17. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
-
receiving, by a first computing device, audio data that corresponds to an utterance; processing the audio data using a hotword data module that is configured to detect a particular, predefined hotword; based on processing the audio data using the hotword data module, generating a first hotword confidence score that reflects a likelihood that the audio data received by the first computing device includes the particular, predefined hotword; receiving, from a second computing device, a second hotword confidence score that reflects a likelihood that the audio data received by the second computing device includes the particular, predefined hotword; receiving, from a third computing device, a third hotword confidence score that reflects a likelihood that the audio data received by the third computing device includes the particular, predefined hotword; comparing the first hotword confidence score, the second hotword confidence score, and the third hotword confidence score; based on comparing the first hotword confidence score, the second hotword confidence score, and the third hotword confidence score; determining, by the first computing device, to process additional audio that corresponds to a subsequent utterance; and selecting, from among the second computing device and the third computing device, one or more computing devices to process the additional audio data that corresponds to the subsequent utterance; and providing, to the selected one or more computing devices, an instruction to process the additional audio data that corresponds to the subsequent utterance. - View Dependent Claims (18, 19, 20)
-
Specification