Audio fingerprinting based on audio energy characteristics

US 10,540,993 B2
Filed: 08/10/2017
Issued: 01/21/2020
Est. Priority Date: 04/08/2016
Status: Active Grant

First Claim

Patent Images

1. A method of audio fingerprinting comprising:

obtaining audio samples of a piece of audio, each of the audio samples corresponding to a specific time;

generating frequency representations of the audio samples, the frequency representations being divided in frequency bands;

identifying energy regions in the frequency bands, each of the energy regions being one of an increasing energy region and a decreasing energy region, an increasing energy region defined as a time region within one of the frequency bands during which audio energy increases from a start time to an end time of the time region and a decreasing energy region defined as a time region within one of the frequency bands during which audio energy decreases from a start time to an end time of the time region;

analyzing portions of the identified energy regions appearing within time windows to generate hashes of features of the piece of audio, each hash of features corresponding to portions of the identified energy regions appearing in a respective time window, each feature defined as a numeric value that encodes information representing;

a frequency band of an energy region appearing in the respective time window, whether the energy region appearing in the respective time window is an increasing energy region or whether the energy region appearing in the respective time window is a decreasing energy region, and a placement of the energy region appearing in the respective time window, the placement of the energy region appearing in the respective time window corresponding to one of;

whether the energy region appearing in the respective time window starts before and ends after the respective time window,whether the energy region appearing in the respective time window starts before and ends within the respective time window,whether the energy region appearing in the respective time window starts within and ends after the respective time window, andwhether the energy region appearing in the respective time window starts within and ends within the respective time window; and

storing each hash of features together with the specific time,wherein the frequency bands include forty four frequency bands whose bandwidth decrease logarithmically from a first frequency band that starts at 200 Hz to a forty fourth frequency band that ends at 3300 Hz.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Audio fingerprinting includes obtaining audio samples of a piece of audio, generating frequency representations of the audio samples, identifying increasing and decreasing energy regions in frequency bands of the frequency representations, and generating hashes of features of the piece of audio. Each hash of features corresponds to portions of the identified energy regions appearing in a respective time window. Each feature is defined as a numeric value that encodes information representing: a frequency band of an energy region appearing in the respective time window, whether the energy region appearing in the respective time window is an increasing energy region or whether the energy region appearing in the respective time window is a decreasing energy region, and a placement of the energy region appearing in the respective time window.

49 Citations

16 Claims

1. A method of audio fingerprinting comprising:
- obtaining audio samples of a piece of audio, each of the audio samples corresponding to a specific time;
  
  generating frequency representations of the audio samples, the frequency representations being divided in frequency bands;
  
  identifying energy regions in the frequency bands, each of the energy regions being one of an increasing energy region and a decreasing energy region, an increasing energy region defined as a time region within one of the frequency bands during which audio energy increases from a start time to an end time of the time region and a decreasing energy region defined as a time region within one of the frequency bands during which audio energy decreases from a start time to an end time of the time region;
  
  analyzing portions of the identified energy regions appearing within time windows to generate hashes of features of the piece of audio, each hash of features corresponding to portions of the identified energy regions appearing in a respective time window, each feature defined as a numeric value that encodes information representing;
  
  a frequency band of an energy region appearing in the respective time window, whether the energy region appearing in the respective time window is an increasing energy region or whether the energy region appearing in the respective time window is a decreasing energy region, and a placement of the energy region appearing in the respective time window, the placement of the energy region appearing in the respective time window corresponding to one of;
  
  whether the energy region appearing in the respective time window starts before and ends after the respective time window,whether the energy region appearing in the respective time window starts before and ends within the respective time window,whether the energy region appearing in the respective time window starts within and ends after the respective time window, andwhether the energy region appearing in the respective time window starts within and ends within the respective time window; and
  
  storing each hash of features together with the specific time,wherein the frequency bands include forty four frequency bands whose bandwidth decrease logarithmically from a first frequency band that starts at 200 Hz to a forty fourth frequency band that ends at 3300 Hz.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1, comprising:
    - converting each hash of features to a MinHash representation of the features or MinHash values; and
      
      storing the MinHash values together with the specific time.
  - 3. The method of claim 1, comprising:
    - converting each hash of features to a MinHash representation of the features having one hundred MinHash values;
      
      sharding the one hundred MinHash values with a shard size of five to obtain twenty rows or groups of five MinHash shard values;
      
      combining the five MinHash shard values within a row or group into a 64 bit number to obtain a fingerprint hash having twenty 64 bit numbers; and
      
      storing the fingerprint hash together with the specific time.
  - 4. The method of claim 1, wherein the obtaining the audio samples of the piece of audio includes:
    - sampling the piece of audio at 8 kHz using a sampling window size of 4096 samples and a window overlap of 31/32.
  - 5. The method of claim 1, wherein each of the time windows has a window size of 1000 milliseconds and a window overlap of 950 milliseconds.

6. A system for audio fingerprinting comprising:
- a sampler configured to obtain audio samples of a piece of audio, each of the audio samples corresponding to a specific time;
  
  a transformer configured to transform the audio samples into frequency representations of the audio samples, the frequency representations being divided in frequency bands;
  
  an energy streamer configured to identify energy regions in the frequency bands, each of the energy regions being one of an increasing energy region and a decreasing energy region, an increasing energy region defined as a time region within a frequency band, of the frequency bands, during which audio energy increases from a start time to an end time of the time region and a decreasing energy region defined as a time region within a frequency band, of the frequency bands, during which audio energy decreases from a start time to an end time of the time region;
  
  an energy hasher configured to analyze portions of the identified energy regions appearing within time windows to generate hashes of features of the piece of audio, each hash of features corresponding to portions of the identified energy regions appearing in a respective time window, each feature defined as a numeric value that encodes information representing;
  
  a frequency band of an energy region appearing in the respective time window, whether the energy region appearing in the respective time window is an increasing energy region or whether the energy region appearing in the respective time window is a decreasing energy region, and a placement of the energy region appearing in the respective time window, the placement of the energy region appearing in the respective time window corresponding to one of;
  
  whether the energy region appearing in the respective time window starts before and ends after the respective time window,whether the energy region appearing in the respective time window starts before and ends within the respective time window,whether the energy region appearing in the respective time window starts within and ends after the respective time window, andwhether the energy region appearing in the respective time window starts within and ends within the respective time window; and
  
  a non-transitory storage medium configured to store each hash of features together with the specific time,a MinHash hasher configured to convert each hash of features to a MinHash representation of the features having one hundred MinHash values;
  
  a sharder configured to shard the one hundred MinHash values with a shard size of five to obtain twenty rows or groups of five MinHash shard values;
  
  a combiner configured to combine the five MinHash shard values within a row or group into a 64 bit number to obtain a fingerprint hash having twenty 64 bit numbers; and
  
  the non-transitory storage medium or another non-transitory storage medium configured to store the fingerprint hash and the specific time.
- View Dependent Claims (7, 8, 9, 10, 11)
- - 7. The system of claim 6, comprising:
    - a MinHash hasher configured to convert each hash of features to a MinHash representation of the features or MinHash values; and
      
      the non-transitory storage medium or another non-transitory storage medium is configured to store the MinHash values and the specific time.
  - 8. The system of claim 6, comprising:
    - a sampler configured to obtain the audio samples of the piece of audio by sampling the piece of audio at 8 kHz using a sampling window size of 4096 samples and a window overlap of 31/32.
  - 9. The system of claim 6, wherein the frequency bands include forty four frequency bands ranging from a first frequency band that starts at 200 Hz to a forty fourth frequency band that ends at 3300 Hz.
  - 10. The system of claim 6, wherein the frequency bands include forty four frequency bands whose bandwidth decrease logarithmically from a first frequency band that starts at 200 Hz to a forty fourth frequency band that ends at 3300 Hz.
  - 11. The system of claim 6, wherein each of the time windows has a window size of 1000 milliseconds and a window overlap of 950 milliseconds.

12. A device for audio fingerprinting comprising:
- a processor; and
  
  a non-transitory computer-readable medium,the processor configured to receive audio samples of a piece of audio, each of the audio samples corresponding to a specific time, process the audio samples, and compare the processed audio samples to processed audio samples stored in the non-transitory computer-readable medium to at least one of identify or synchronize the piece of audio, wherein the processor is configured to process the audio samples by;
  
  transforming the audio samples into frequency representations of the audio samples, the frequency representations being divided in frequency bands;
  
  identifying energy regions within the frequency bands, each of the energy regions being one of an increasing energy region and a decreasing energy region, an increasing energy region defined as a time region within one of the frequency bands during which audio energy increases from a start time to an end time of the time region and a decreasing energy region defined as a time region within one of the frequency bands during which audio energy decreases from a start time to an end time of the time region;
  
  analyzing portions of the identified energy regions appearing within time windows to generate hashes of features of the piece of audio, each hash of features corresponding to portions of the identified energy regions appearing in a respective time window, each feature defined as a numeric value that encodes information representing;
  
  a frequency band of an energy region appearing in the respective time window, whether the energy region appearing in the respective time window is an increasing energy region or whether the energy region appearing in the respective time window is a decreasing energy region, and a placement of the energy region appearing in the respective time window, the placement of the energy region appearing in the respective time window corresponding to one of;
  
  whether the energy region appearing in the respective time window starts before and ends after the respective time window,whether the energy region appearing in the respective time window starts before and ends within the respective time window,whether the energy region appearing in the respective time window starts within and ends after the respective time window, andwhether the energy region appearing in the respective time window starts within and ends within the respective time window,wherein the frequency bands include forty four frequency bands whose bandwidth decrease logarithmically from a first frequency band that starts at 200 Hz to a forty fourth frequency band that ends at 3300 Hz.
- View Dependent Claims (13, 14, 15, 16)
- - 13. The device of claim 12, wherein the processor is configured to convert each hash of features to a MinHash representation of the features having MinHash values;
    - andthe non-transitory storage medium or another non-transitory storage medium is configured to store the MinHash values and the specific time.
  - 14. The device of claim 12, wherein the processor is configured to:
    - convert each hash of features to a MinHash representation of the features having one hundred MinHash values;
      
      shard the one hundred MinHash values with a shard size of five to obtain twenty rows or groups of five MinHash shard values;
      
      combine the five MinHash shard values within a row or group into a 64 bit number to obtain a fingerprint hash having twenty 64 bit numbers; and
      
      the non-transitory storage medium or another non-transitory storage medium is configured to store the fingerprint hash and the specific time.
  - 15. The device of claim 12, wherein the processor is configured to:
    - obtain the audio samples of the piece of audio by sampling the piece of audio at 8 kHz using a sampling window size of 4096 samples and a window overlap of 31/32.
  - 16. The device of claim 12, wherein the processor sets each of the time windows to have a window size of 1000 milliseconds and a window overlap of 950 milliseconds.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Source Digital, Inc.
Original Assignee
Source Digital, Inc.
Inventors
Greene, Patrick
Primary Examiner(s)
Tsang, Fan S
Assistant Examiner(s)
Siegel, David

Application Number

US15/674,343
Publication Number

US 20170365276A1
Time in Patent Office

894 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/632   Query formulation

G06F 16/683   using metadata automaticall...

G10H 1/0008   Associated control or indic...

G10H 2210/031   Musical analysis, i.e. isol...

G10H 2240/141   Library retrieval matching,...

G10H 2250/235   Fourier transform; Discrete...

G10H 2250/261   Window, i.e. apodization fu...

G10L 25/21   the extracted parameters be...

G10L 25/54   for retrieval

Audio fingerprinting based on audio energy characteristics

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

49 Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Audio fingerprinting based on audio energy characteristics

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

49 Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links