Audio fingerprinting based on audio energy characteristics
First Claim
1. A method of audio fingerprinting comprising:
- obtaining audio samples of a piece of audio, each of the audio samples corresponding to a specific time;
generating frequency representations of the audio samples, the frequency representations being divided in frequency bands;
identifying energy regions in the frequency bands, each of the energy regions being one of an increasing energy region and a decreasing energy region, an increasing energy region defined as a time region within one of the frequency bands during which audio energy increases from a start time to an end time of the time region and a decreasing energy region defined as a time region within one of the frequency bands during which audio energy decreases from a start time to an end time of the time region;
analyzing portions of the identified energy regions appearing within time windows to generate hashes of features of the piece of audio, each hash of features corresponding to portions of the identified energy regions appearing in a respective time window, each feature defined as a numeric value that encodes information representing;
a frequency band of an energy region appearing in the respective time window, whether the energy region appearing in the respective time window is an increasing energy region or whether the energy region appearing in the respective time window is a decreasing energy region, and a placement of the energy region appearing in the respective time window, the placement of the energy region appearing in the respective time window corresponding to one of;
whether the energy region appearing in the respective time window starts before and ends after the respective time window,whether the energy region appearing in the respective time window starts before and ends within the respective time window,whether the energy region appearing in the respective time window starts within and ends after the respective time window, andwhether the energy region appearing in the respective time window starts within and ends within the respective time window; and
storing each hash of features together with the specific time,wherein the frequency bands include forty four frequency bands whose bandwidth decrease logarithmically from a first frequency band that starts at 200 Hz to a forty fourth frequency band that ends at 3300 Hz.
1 Assignment
0 Petitions
Accused Products
Abstract
Audio fingerprinting includes obtaining audio samples of a piece of audio, generating frequency representations of the audio samples, identifying increasing and decreasing energy regions in frequency bands of the frequency representations, and generating hashes of features of the piece of audio. Each hash of features corresponds to portions of the identified energy regions appearing in a respective time window. Each feature is defined as a numeric value that encodes information representing: a frequency band of an energy region appearing in the respective time window, whether the energy region appearing in the respective time window is an increasing energy region or whether the energy region appearing in the respective time window is a decreasing energy region, and a placement of the energy region appearing in the respective time window.
49 Citations
16 Claims
-
1. A method of audio fingerprinting comprising:
-
obtaining audio samples of a piece of audio, each of the audio samples corresponding to a specific time; generating frequency representations of the audio samples, the frequency representations being divided in frequency bands; identifying energy regions in the frequency bands, each of the energy regions being one of an increasing energy region and a decreasing energy region, an increasing energy region defined as a time region within one of the frequency bands during which audio energy increases from a start time to an end time of the time region and a decreasing energy region defined as a time region within one of the frequency bands during which audio energy decreases from a start time to an end time of the time region; analyzing portions of the identified energy regions appearing within time windows to generate hashes of features of the piece of audio, each hash of features corresponding to portions of the identified energy regions appearing in a respective time window, each feature defined as a numeric value that encodes information representing;
a frequency band of an energy region appearing in the respective time window, whether the energy region appearing in the respective time window is an increasing energy region or whether the energy region appearing in the respective time window is a decreasing energy region, and a placement of the energy region appearing in the respective time window, the placement of the energy region appearing in the respective time window corresponding to one of;whether the energy region appearing in the respective time window starts before and ends after the respective time window, whether the energy region appearing in the respective time window starts before and ends within the respective time window, whether the energy region appearing in the respective time window starts within and ends after the respective time window, and whether the energy region appearing in the respective time window starts within and ends within the respective time window; and storing each hash of features together with the specific time, wherein the frequency bands include forty four frequency bands whose bandwidth decrease logarithmically from a first frequency band that starts at 200 Hz to a forty fourth frequency band that ends at 3300 Hz. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A system for audio fingerprinting comprising:
-
a sampler configured to obtain audio samples of a piece of audio, each of the audio samples corresponding to a specific time; a transformer configured to transform the audio samples into frequency representations of the audio samples, the frequency representations being divided in frequency bands; an energy streamer configured to identify energy regions in the frequency bands, each of the energy regions being one of an increasing energy region and a decreasing energy region, an increasing energy region defined as a time region within a frequency band, of the frequency bands, during which audio energy increases from a start time to an end time of the time region and a decreasing energy region defined as a time region within a frequency band, of the frequency bands, during which audio energy decreases from a start time to an end time of the time region; an energy hasher configured to analyze portions of the identified energy regions appearing within time windows to generate hashes of features of the piece of audio, each hash of features corresponding to portions of the identified energy regions appearing in a respective time window, each feature defined as a numeric value that encodes information representing;
a frequency band of an energy region appearing in the respective time window, whether the energy region appearing in the respective time window is an increasing energy region or whether the energy region appearing in the respective time window is a decreasing energy region, and a placement of the energy region appearing in the respective time window, the placement of the energy region appearing in the respective time window corresponding to one of;whether the energy region appearing in the respective time window starts before and ends after the respective time window, whether the energy region appearing in the respective time window starts before and ends within the respective time window, whether the energy region appearing in the respective time window starts within and ends after the respective time window, and whether the energy region appearing in the respective time window starts within and ends within the respective time window; and a non-transitory storage medium configured to store each hash of features together with the specific time, a MinHash hasher configured to convert each hash of features to a MinHash representation of the features having one hundred MinHash values; a sharder configured to shard the one hundred MinHash values with a shard size of five to obtain twenty rows or groups of five MinHash shard values; a combiner configured to combine the five MinHash shard values within a row or group into a 64 bit number to obtain a fingerprint hash having twenty 64 bit numbers; and the non-transitory storage medium or another non-transitory storage medium configured to store the fingerprint hash and the specific time. - View Dependent Claims (7, 8, 9, 10, 11)
-
-
12. A device for audio fingerprinting comprising:
-
a processor; and a non-transitory computer-readable medium, the processor configured to receive audio samples of a piece of audio, each of the audio samples corresponding to a specific time, process the audio samples, and compare the processed audio samples to processed audio samples stored in the non-transitory computer-readable medium to at least one of identify or synchronize the piece of audio, wherein the processor is configured to process the audio samples by; transforming the audio samples into frequency representations of the audio samples, the frequency representations being divided in frequency bands; identifying energy regions within the frequency bands, each of the energy regions being one of an increasing energy region and a decreasing energy region, an increasing energy region defined as a time region within one of the frequency bands during which audio energy increases from a start time to an end time of the time region and a decreasing energy region defined as a time region within one of the frequency bands during which audio energy decreases from a start time to an end time of the time region; analyzing portions of the identified energy regions appearing within time windows to generate hashes of features of the piece of audio, each hash of features corresponding to portions of the identified energy regions appearing in a respective time window, each feature defined as a numeric value that encodes information representing;
a frequency band of an energy region appearing in the respective time window, whether the energy region appearing in the respective time window is an increasing energy region or whether the energy region appearing in the respective time window is a decreasing energy region, and a placement of the energy region appearing in the respective time window, the placement of the energy region appearing in the respective time window corresponding to one of;whether the energy region appearing in the respective time window starts before and ends after the respective time window, whether the energy region appearing in the respective time window starts before and ends within the respective time window, whether the energy region appearing in the respective time window starts within and ends after the respective time window, and whether the energy region appearing in the respective time window starts within and ends within the respective time window, wherein the frequency bands include forty four frequency bands whose bandwidth decrease logarithmically from a first frequency band that starts at 200 Hz to a forty fourth frequency band that ends at 3300 Hz. - View Dependent Claims (13, 14, 15, 16)
-
Specification