Post-speech recognition request surplus detection and prevention
First Claim
1. A method for preventing a backend system from providing an error message, comprising:
- receiving, from a requesting device, audio data representing a phrase;
determining a temporal window during which the audio data was received;
generating text data from the audio data by executing speech-to-text functionality;
identifying a category that has been generated for the phrase, the category signifying that the text data represents the phrase;
adding a count to the category to indicate that another instance of the category has been identified;
determining that a number of additional instances of the text data have been recognized from additional outputs from speech recognition functionality corresponding to additional audio data that also represent the phrase has been received by the backend system from additional requesting devices;
adding additional counts to the category for each of the number of additional instances such that a total number of counts for the category is representative of how many of the additional requesting devices sent similar audio data representing the phrase to the backend system;
determining that the additional audio data was also received within the temporal window;
determining a threshold count value indicative of the phrase originating from a non-human source;
determining that the total number of counts is greater than the threshold count value;
based at least in part on determining that the total number of counts is greater than the threshold count value, causing the speech recognition functionality to stop prior to providing the text data to natural language understanding functionality;
generating an instruction for the requesting device to return to a sleep state; and
sending the instruction to the requesting device.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods for determining that artificial commands, in excess of a threshold value, are detected by multiple voice activated electronic devices is described herein. In some embodiments, numerous voice activated electronic devices may send audio data representing a phrase to a backend system at a substantially same time. Text data representing the phrase, and counts for instances of that text data, may be generated. If the number of counts exceeds a predefined threshold, the backend system may cause any remaining response generation functionality that particular command that is in excess of the predefined threshold to be stopped, and those devices returned to a sleep state. In some embodiments, a sound profile unique to the phrase that caused the excess of the predefined threshold may be generated such that future instances of the same phrase may be recognized prior to text data being generated, conserving the backend system'"'"'s resources.
-
Citations
22 Claims
-
1. A method for preventing a backend system from providing an error message, comprising:
-
receiving, from a requesting device, audio data representing a phrase; determining a temporal window during which the audio data was received; generating text data from the audio data by executing speech-to-text functionality; identifying a category that has been generated for the phrase, the category signifying that the text data represents the phrase; adding a count to the category to indicate that another instance of the category has been identified; determining that a number of additional instances of the text data have been recognized from additional outputs from speech recognition functionality corresponding to additional audio data that also represent the phrase has been received by the backend system from additional requesting devices; adding additional counts to the category for each of the number of additional instances such that a total number of counts for the category is representative of how many of the additional requesting devices sent similar audio data representing the phrase to the backend system; determining that the additional audio data was also received within the temporal window; determining a threshold count value indicative of the phrase originating from a non-human source; determining that the total number of counts is greater than the threshold count value; based at least in part on determining that the total number of counts is greater than the threshold count value, causing the speech recognition functionality to stop prior to providing the text data to natural language understanding functionality; generating an instruction for the requesting device to return to a sleep state; and sending the instruction to the requesting device. - View Dependent Claims (2, 3, 4)
-
-
5. A method for preventing a system error, comprising:
-
receiving, at a backend system, first audio data from a plurality of different voice-activated electronic devices; determining, at the backend system and using the first audio data, that a first phrase was detected by more than a threshold number of the voice-activated electronic devices within a temporal window; and at the backend system and based at least in part on the first phrase being detected by more than the threshold number of the voice-activated electronic devices within the temporal window, causing speech recognition functionality to stop for at least a first portion of the first audio data received from at least one of the voice-activated electronic devices. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 22)
-
-
13. A system, comprising:
-
at least one processor; and at least one computer-readable medium encoded with instructions which, when executed by the at least one processor, cause the system to; receive first audio data from a plurality of different voice-activated electronic devices; determine, using the first audio data, that a first phrase was detected by more than a threshold number of the voice-activate electronic devices within a temporal window; based at least in part on the first phrase being detected by more than the threshold number of the voice-activated electronic devices during the temporal window, cause speech recognition functionality to stop for at least a first portion of the first audio data received from at least one of the voice-activated electronic devices. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21)
-
Specification