Methods and systems for detecting and processing speech signals
First Claim
1. A computer-implemented method comprising:
- receiving, at a centralized processing device, a corresponding hotword confidence score from each of multiple media devices in communication with the centralized processing device via a network, each hotword confidence score indicating a likelihood that audio data corresponding to a first utterance of a user received by the corresponding media device includes a particular, predefined hotword;
determining, by the centralized processing device, that two or more of the received hotword confidence scores satisfy a hotword score threshold;
for each of the two or more media devices having hotword confidence scores that satisfy the hotword score threshold, receiving, at the centralized processing device, second audio data from the corresponding media device, the second audio data recorded by the corresponding media device and including a user speech command; and
generating, by the centralized processing device, a request associated with the user speech command based on the second audio data received from each of the two or more media devices having hotword confidence scores that satisfy the hotword score threshold.
2 Assignments
0 Petitions
Accused Products
Abstract
Provided are methods, systems, and apparatuses for detecting, processing, and responding to audio signals, including speech signals, within a designated area or space. A platform for multiple media devices connected via a network is configured to process speech, such as voice commands, detected at the media devices, and respond to the detected speech by causing the media devices to simultaneously perform one or more requested actions. The platform is capable of scoring the quality of a speech request, handling speech requests from multiple end points of the platform using a centralized processing approach, a de-centralized processing approach, or a combination thereof, and also manipulating partial processing of speech requests from multiple end points into a coherent whole when necessary.
96 Citations
20 Claims
-
1. A computer-implemented method comprising:
-
receiving, at a centralized processing device, a corresponding hotword confidence score from each of multiple media devices in communication with the centralized processing device via a network, each hotword confidence score indicating a likelihood that audio data corresponding to a first utterance of a user received by the corresponding media device includes a particular, predefined hotword; determining, by the centralized processing device, that two or more of the received hotword confidence scores satisfy a hotword score threshold; for each of the two or more media devices having hotword confidence scores that satisfy the hotword score threshold, receiving, at the centralized processing device, second audio data from the corresponding media device, the second audio data recorded by the corresponding media device and including a user speech command; and generating, by the centralized processing device, a request associated with the user speech command based on the second audio data received from each of the two or more media devices having hotword confidence scores that satisfy the hotword score threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; receiving a corresponding hotword confidence score from each of multiple media devices in communication with the one or more computers via a network, each hotword confidence score indicating a likelihood that audio data corresponding to a first utterance of a user received by the corresponding media device includes a particular, predefined hotword; determining that two or more of the received hotword confidence scores satisfy a hotword score threshold; for each of the two or more media devices having hotword confidence scores that satisfy the hotword score threshold, receiving second audio data from the corresponding media device, the second audio data recorded by the corresponding media device and including a user speech command; and generating a request associated with the user speech command based on the second audio data received from each of the two or more media devices having hotword confidence scores that satisfy the hotword score threshold. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
15. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
-
receiving a corresponding hotword confidence score from each of multiple media devices in communication with the one or more computers via a network, each hotword confidence score indicating a likelihood that audio data corresponding to a first utterance of a user received by the corresponding media device includes a particular, predefined hotword; determining that two or more of the received hotword confidence scores satisfy a hotword score threshold; for each of the two or more media devices having hotword confidence scores that satisfy the hotword score threshold, receiving second audio data from the corresponding media device, the second audio data recorded by the corresponding media device and including a user speech command; and generating a request associated with the user speech command based on the second audio data received from each of the two or more media devices having hotword confidence scores that satisfy the hotword score threshold. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification