Methods and devices for ignoring similar audio being received by a system
First Claim
1. A method, comprising:
- receiving, at a backend system, first audio data;
receiving a first timestamp indicating a first time that the first audio data was sent to the backend system by a first user device;
receiving, at the backend system, second audio data;
receiving a second timestamp indicating a second time that the second audio data was sent to the backend system by a second user device;
determining that an amount of time between the first time and the second time is less than a predetermined period of time, which indicates that the first audio data and the second audio data were sent at a substantially same time;
generating a first audio fingerprint of the first audio data by performing a first fast Fourier transform (“
FFT”
) on the first audio data, the first audio fingerprint comprising first data representing a first time-frequency profile of the first audio data;
generating a second audio fingerprint of the second audio data by performing a second FFT on the second audio data, the second audio fingerprint comprising second data representing a second time-frequency profile of the second audio data;
determining a bit error rate between the first audio fingerprint and the second audio fingerprint by determining a number of different bits between the first audio fingerprint and the second audio fingerprint, and then dividing the number by a total number of bits;
determining that the bit error rate is less than a predefined bit error rate threshold value indicating that the first audio data and the second audio data both represent a same sound; and
storing the first audio fingerprint as a flagged audio fingerprint in memory on the backend system such that receipt of additional audio data that has a matching audio fingerprint is ignored by the backend system.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods for detecting similar audio being received by separate voice activated electronic devices, and ignoring those commands, is described herein. In some embodiments, a voice activated electronic device may be activated by a wakeword that is output by the additional electronic device, such as a television or radio, may capture audio of sound subsequently following the wakeword, and may send audio data representing the sound to a backend system. Upon receipt, the backend system may, in parallel to performing automated speech recognition processing to the audio data, generate a sound profile of the audio data, and may compare that sound profile to sound profiles of recently received audio data and/or flagged sound profiles. If the generated sound profile is determined to match another sound profiles, then the automated speech recognition processing may be stopped, and the voice activated electronic device may be instructed to return to a keyword spotting mode. If the matching sound profile is not already stored in a database of known sound profiles, it can be stored for future comparisons.
-
Citations
20 Claims
-
1. A method, comprising:
-
receiving, at a backend system, first audio data; receiving a first timestamp indicating a first time that the first audio data was sent to the backend system by a first user device; receiving, at the backend system, second audio data; receiving a second timestamp indicating a second time that the second audio data was sent to the backend system by a second user device; determining that an amount of time between the first time and the second time is less than a predetermined period of time, which indicates that the first audio data and the second audio data were sent at a substantially same time; generating a first audio fingerprint of the first audio data by performing a first fast Fourier transform (“
FFT”
) on the first audio data, the first audio fingerprint comprising first data representing a first time-frequency profile of the first audio data;generating a second audio fingerprint of the second audio data by performing a second FFT on the second audio data, the second audio fingerprint comprising second data representing a second time-frequency profile of the second audio data; determining a bit error rate between the first audio fingerprint and the second audio fingerprint by determining a number of different bits between the first audio fingerprint and the second audio fingerprint, and then dividing the number by a total number of bits; determining that the bit error rate is less than a predefined bit error rate threshold value indicating that the first audio data and the second audio data both represent a same sound; and storing the first audio fingerprint as a flagged audio fingerprint in memory on the backend system such that receipt of additional audio data that has a matching audio fingerprint is ignored by the backend system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A backend system, comprising:
-
memory; communications circuitry; and at least one processor operable to; receive first audio data; receive a first timestamp indicating a first time that the first audio data was sent to the backend system by a first user device; receive second audio data; receive a second time stamp indicating a second time that the second audio data was sent to the backend system by a second user device; determine that an amount of time between the first time and the second time is less than a predetermined period of time, which indicates that the first audio data and the second audio data were sent at a substantially same time; generate a first audio fingerprint of the first audio data by performing a first fast Fourier transform (“
FFT”
) on the first audio data, the first audio fingerprint comprising first data representing a first time-frequency profile of the first audio data;generate a second audio fingerprint of the second audio data by performing a second FFT on the second audio data, the second audio fingerprint comprising second data representing a second time-frequency profile of the second audio data; determine a bit error rate between the first audio fingerprint and the second audio fingerprint by determining a number of different bits between the first audio fingerprint and the second audio fingerprint, and then dividing the number by a total number of bits; determine that the bit error rate is less than a predefined bit error rate threshold value indicating that the first audio data and the second audio data both represent a same sound; and store the first audio fingerprint as a flagged audio fingerprint in the memory such that receipt of additional audio data that has a matching audio fingerprint is ignored. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification