Sound profile generation based on speech recognition results exceeding a threshold
First Claim
1. A method, comprising:
- receiving, at an electronic device, audio data representing a phrase;
generating text data representing the phrase by executing speech-to-text functionality;
identifying a category that has been generated for the phrase, the category signifying that the text data represents the phrase;
adding a count to the category to indicate that another instance of the category has been identified;
determining a total number of counts for the category;
determining, based on the total number of counts for the category, that multiple requesting devices have sent audio data representing the phrase to the electronic device during a same temporal window;
based at least in part on a determination that multiple requesting devices have sent audio data representing the phrase to the electronic device during the same temporal window, generating an audio fingerprint corresponding to the audio data;
storing the audio fingerprint on the electronic device;
receiving additional audio data also representing the phrase;
generating an additional audio fingerprint corresponding to the additional audio data;
determining that a bit error rate of the additional audio fingerprint as compared to the audio fingerprint;
determining that the bit error rate is less than a bit error rate threshold value indicating that the audio data and the additional audio data both represent the phrase; and
based at least in part on a determination that the bit error rate is less than the bit error rate threshold value, refraining from performing at least some automatic speech recognition processing for the additional audio data.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods for generating sound profiles of artificial commands detected by multiple voice activated electronic devices is described herein. In some embodiments, numerous voice activated electronic devices may send audio data representing a phrase to a backend system at a substantially same time. Text data representing the phrase, and counts for instances of that text data, may be generated. If the number of counts exceeds a predefined threshold, the backend system may cause any remaining response generation functionality that particular command that is in excess of the predefined threshold to be stopped, and those devices returned to a sleep state. In some embodiments, a sound profile unique to the phrase that caused the excess of the predefined threshold may be generated such that future instances of the same phrase may be recognized prior to text data being generated, conserving the backend system'"'"'s resources.
56 Citations
22 Claims
-
1. A method, comprising:
-
receiving, at an electronic device, audio data representing a phrase; generating text data representing the phrase by executing speech-to-text functionality; identifying a category that has been generated for the phrase, the category signifying that the text data represents the phrase; adding a count to the category to indicate that another instance of the category has been identified; determining a total number of counts for the category; determining, based on the total number of counts for the category, that multiple requesting devices have sent audio data representing the phrase to the electronic device during a same temporal window; based at least in part on a determination that multiple requesting devices have sent audio data representing the phrase to the electronic device during the same temporal window, generating an audio fingerprint corresponding to the audio data; storing the audio fingerprint on the electronic device; receiving additional audio data also representing the phrase; generating an additional audio fingerprint corresponding to the additional audio data; determining that a bit error rate of the additional audio fingerprint as compared to the audio fingerprint; determining that the bit error rate is less than a bit error rate threshold value indicating that the audio data and the additional audio data both represent the phrase; and based at least in part on a determination that the bit error rate is less than the bit error rate threshold value, refraining from performing at least some automatic speech recognition processing for the additional audio data. - View Dependent Claims (2, 3, 4)
-
-
5. A method, comprising:
-
receiving a first instance of audio data representing a first sound; determining that, within a temporal window, a plurality of additional instances of audio data representing the first sound are also received; determining a number of the instances of audio data representing the first sound that are received within the temporal window; determining that the number of the instances is greater than a threshold value; based at least in part on a determination that the number of the instances is greater than the threshold value, generating a first sound profile of the first sound; storing the first sound profile; receiving second audio data representing a second sound; generating a second sound profile of the second sound; determining that a similarity value of the second sound profile and the first sound profile is greater than a similarity threshold value; and based at least in part on a determination that the similarity value is greater than the similarity threshold value, refraining from performing at least some automated speech recognition processing for the second audio data. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. An electronic system, comprising:
-
and at least one processor operable to; receive a first instance of audio data representing a first sound; determine that, within a temporal window, a plurality of additional instances of audio data representing the first sound are also received; determine a number of the instances of audio data representing the first sound that are received within the temporal window; determine that the number of instances is greater than a threshold value; based at least in part on a determination that the number of instances is greater than the threshold value, generate a first sound profile of the first sound; store the first sound profile; receive second audio data representing a second sound; generate a second sound profile of the second sound; determine that a similarity value of the second sound profile and the first sound profile is greater than a similarity threshold value; and based at least in part on a determination that the similarity value is greater than the similarity threshold value, refrain from performing at least some automated speech recognition processing for the second audio data. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22)
-
Specification