Feature-based audio content identification

US 6,604,072 B2
Filed: 03/09/2001
Issued: 08/05/2003
Est. Priority Date: 11/03/2000
Status: Active Grant

First Claim

Patent Images

1. A method for identifying audio content, said method comprising the steps of:

obtaining an audio signal;

analyzing the power spectrum of the audio signal so as to obtain a plurality of time dependent frequency components; and

detecting a plurality of events, each of the events being a crossing of the value of a first running average and the value of a second running average, wherein the first running average is an average over a first averaging period of a first subset of the time dependent frequency components, and the second running average is an average over a second averaging period, which is different than the first averaging period, of the first subset of the time dependent frequency components.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An audio signal is sampled and a frequency transform is performed on a succession of sets of samples of the signal to obtain a time dependent power spectrum for the audio signal. Frequency components output by the frequency transform are collected in frequency bands. More than one running average is taken of each semitone frequency band. When the values of two running averages of the same semitone frequency band cross, time information is recorded. Information about average crossing events that have occurred at different times in a set of adjacent semitone frequency bands is combined to form a key. A set of keys obtained from a song provides a means for identifying the song and is stored in a database for use in identifying songs.

94 Citations

View as Search Results

26 Claims

1. A method for identifying audio content, said method comprising the steps of:
- obtaining an audio signal;
  
  analyzing the power spectrum of the audio signal so as to obtain a plurality of time dependent frequency components; and
  
  detecting a plurality of events, each of the events being a crossing of the value of a first running average and the value of a second running average, wherein the first running average is an average over a first averaging period of a first subset of the time dependent frequency components, and the second running average is an average over a second averaging period, which is different than the first averaging period, of the first subset of the time dependent frequency components.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method according to claim 1, further comprising the steps of:
3. The method according to claim 1, further comprising the step of determining a time dependent frequency component power corresponding to each event.
4. The method according to claim 1, wherein the analyzing step includes the sub-steps of:
- sampling the audio signal to obtain a plurality of audio signal samples;
  
  taking a plurality of subsets from the plurality of audio signal samples; and
  
  performing a Fourier transform on each of the plurality of subsets to obtain a set of Fourier frequency components.
5. The method according to claim 4, wherein the analyzing step further includes the sub-step of averaging together corresponding Fourier frequency components obtained from two or more successive subsets selected from the plurality of subsets.
6. The method according to claim 5, wherein the analyzing step further includes the sub-step of collecting Fourier frequency components into a plurality of semitone frequency bands.
7. The method according to claim 1, wherein the detecting step includes the sub-steps of:
- keeping the first running average over the first averaging period of a first subset of the time dependent frequency components so as to obtain a first series of averages for the first averaging period;
  
  keeping the second running average over the second averaging period of the first subset of the time dependent frequency components so as to obtain a second series of averages for the first averaging period; and
  
  recording a plurality of event times, each of the event times being a time at which there occurs one of the detected events of the first running average crossing the second running average.
8. The method according to claim 1,wherein the first averaging period is between approximately {fraction (1/10)} of a second and approximately 1 second, and the second averaging period is from approximately 2 to 8 times as long as the first averaging period.
9. The method according to claim 1, further comprising the step of collecting the plurality of events in a plurality of time groups each of which covers an interval of time.
10. The method according to claim 9, further comprising the step of:
- in response to detecting each event in each of the plurality of time dependent frequency components, selecting one or more combinations of events from a plurality of events that occurred within a number of time groups, and within a number of time dependent frequency components.
11. The method according to claim 10, wherein the selecting step includes the sub-step of selecting one or more combinations of events from a plurality of events that occurred within a number of time groups, and within a number of time dependent frequency components, taking only one event at a time from each time group.
12. The method according to claim 10, further comprising the step of forming a plurality of keys from the one or more combinations each of which comprises a time to be associated with the combination of events, and a key sequence including information about each event in the combination.

13. A method for forming an identifying feature of a portion of a recording of audio signals, said method comprising the steps of:
- performing a Fourier transformation of the audio signals of the portion into a time series of audio power dissipated over a first plurality of frequencies;
  
  grouping the frequencies into a smaller second plurality of bands that each include a range of neighboring frequencies;
  
  detecting power dissipation events in each of the bands; and
  
  grouping together the power dissipation events from mutually adjacent bands at a selected moment so as to form the identifying feature, wherein each of the power dissipation events is a crossing of the value of a first running average and the value of a second running average, the first running average is an average over a first averaging period of the audio power dissipated, and the second running average is an average over a second averaging period, which is different than the first averaging period, of the audio power dissipated.
- View Dependent Claims (14, 15, 16, 17)
- - 14. The method according to claim 13, further comprising the step of integrating power dissipation in each of the bands over a predetermined period.
  - 15. The method according to claim 14, wherein each of the power dissipation events is a crossover of rolling energy dissipation levels over time periods of different lengths.
  - 16. A method of determining whether an audio stream includes at least a portion of a known recording of audio signals, said method comprising the steps of:
17. The method according to claim 16, wherein each of the power dissipation events is a crossover of rolling energy dissipation levels over time periods of different lengths.

18. A computer-readable medium encoded with a program for identifying audio content, said program containing instructions for performing the steps of:
- obtaining an audio signal;
  
  analyzing the power spectrum of the audio signal so as to obtain a plurality of time dependent frequency components; and
  
  detecting a plurality of events, each of the events being a crossing of the value of a first running average and the value of a second running average, wherein the first running average is an average over a first averaging period of a first subset of the time dependent frequency components, and the second running average is an average over a second averaging period, which is different than the first averaging period, of the first subset of the time dependent frequency components.
- View Dependent Claims (19, 20, 21)
- - 19. The computer-readable medium according to claim 18, wherein said program further contains instructions for performing the steps of:
20. The computer-readable medium according to claim 18, wherein the analyzing step includes the sub-steps of:
- sampling the audio signal to obtain a plurality of audio signal samples;
  
  taking a plurality of subsets from the plurality of audio signal samples; and
  
  performing a Fourier transform on each of the plurality of subsets to obtain a set of Fourier frequency components.
21. The computer-readable medium according to claim 18, wherein the detecting step includes the sub-steps of:
- keeping the first running average over the first averaging period of a first subset of the time dependent frequency components so as to obtain a first series of averages for the first averaging period;
  
  keeping the second running average over the second averaging period of the first subset of the time dependent frequency components so as to obtain a second series of averages for the first averaging period; and
  
  recording a plurality of event times, each of the event times being a time at which there occurs one of the detected events of the first running average crossing the second running average.

22. A computer-readable medium encoded with a program for forming an identifying feature of a portion of a recording of audio signals, said program containing instructions for performing the steps of:
- performing a Fourier transformation of the audio signals of the portion into a time series of audio power dissipated over a first plurality of frequencies;
  
  grouping the frequencies into a smaller second plurality of bands that each include a range of neighboring frequencies;
  
  detecting power dissipation events in each of the bands; and
  
  grouping together the power dissipation events from mutually adjacent bands at a selected moment so as to form the identifying feature, wherein each of the power dissipation events is a crossing of the value of a first running average and the value of a second running average, the first running average is an average over a first averaging period of the audio power dissipated, and the second running average is an average over a second averaging period, which is different than the first averaging period, of the audio power dissipated.

23. A system for identifying a recording of an audio signal, said system comprising:
- an interface for receiving an audio signal to be identified;
  
  a spectrum analyzer for analyzing the power spectrum of the audio signal so as to produce a plurality of time dependent frequency components from the audio signal;
  
  an event detector for detecting a plurality of events in each of the time dependent frequency components; and
  
  a key generator for grouping the plurality of events by frequency and time, and assembling a plurality of keys based on the plurality of events, wherein each of the events detected by the event detector is a crossing of the value of a first running average and the value of a second running average, the first running average is an average over a first averaging period of a first subset of the time dependent frequency components, and the second running average is an average over a second averaging period, which is different than the first averaging period, of the first subset of the time dependent frequency components.
- View Dependent Claims (24, 25)
- - 24. The system according to claim 23, wherein the event detector is a peak detector.
  - 25. The system according to claim 23, further comprising a database of keys of known recordings of audio signals.

26. A system for forming an identifying feature of a portion of a recording of audio signals, said system comprising:
- means for performing a Fourier transformation of the audio signals of the portion into a time series of audio power dissipated over a first plurality of frequencies;
  
  means for grouping the frequencies into a smaller second plurality of bands that each include a range of neighboring frequencies;
  
  means for detecting power dissipation events in each of the bands; and
  
  means for grouping together the power dissipation events from mutually adjacent bands at a selected moment so as to form the identifying feature, wherein each of the power dissipation events detected by the means for detecting is a crossing of the value of a first running average and the value of a second running average, the first running average is an average over a first averaging period of the audio power dissipated, and the second running average is an average over a second averaging period, which is different than the first averaging period, of the audio power dissipated.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Pitman, Michael C., Fitch, Blake G., Germain, Robert S., Abrams, Steven
Primary Examiner(s)
Knepper, David D.

Application Number

US09/803,298
Publication Number

US 20020143530A1
Time in Patent Office

879 Days
Field of Search

704/231, 704/236, 704/246, 704/270, 725/22, 702/73, 708/5
US Class Current

704/231
CPC Class Codes

G11B 20/00086   Circuits for prevention of ...

G11B 20/10527   Audio or video recording; D...

G11B 2020/10546   specifically adapted for au...

H04H 20/14   for monitoring programmes

H04N 21/4394   involving operations for an...

H04N 21/4627   Rights management associate...

Feature-based audio content identification

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

94 Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

Feature-based audio content identification

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

94 Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links