Feature-based audio content identification
First Claim
1. A method for identifying audio content, said method comprising the steps of:
- obtaining an audio signal;
analyzing the power spectrum of the audio signal so as to obtain a plurality of time dependent frequency components; and
detecting a plurality of events, each of the events being a crossing of the value of a first running average and the value of a second running average, wherein the first running average is an average over a first averaging period of a first subset of the time dependent frequency components, and the second running average is an average over a second averaging period, which is different than the first averaging period, of the first subset of the time dependent frequency components.
1 Assignment
0 Petitions
Accused Products
Abstract
An audio signal is sampled and a frequency transform is performed on a succession of sets of samples of the signal to obtain a time dependent power spectrum for the audio signal. Frequency components output by the frequency transform are collected in frequency bands. More than one running average is taken of each semitone frequency band. When the values of two running averages of the same semitone frequency band cross, time information is recorded. Information about average crossing events that have occurred at different times in a set of adjacent semitone frequency bands is combined to form a key. A set of keys obtained from a song provides a means for identifying the song and is stored in a database for use in identifying songs.
94 Citations
26 Claims
-
1. A method for identifying audio content, said method comprising the steps of:
-
obtaining an audio signal;
analyzing the power spectrum of the audio signal so as to obtain a plurality of time dependent frequency components; and
detecting a plurality of events, each of the events being a crossing of the value of a first running average and the value of a second running average, wherein the first running average is an average over a first averaging period of a first subset of the time dependent frequency components, and the second running average is an average over a second averaging period, which is different than the first averaging period, of the first subset of the time dependent frequency components. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
detecting a set of events occurring at nearby times in adjacent frequencies bands; and
forming a key by combining at least a subset of the set of events.
-
-
3. The method according to claim 1, further comprising the step of determining a time dependent frequency component power corresponding to each event.
-
4. The method according to claim 1, wherein the analyzing step includes the sub-steps of:
-
sampling the audio signal to obtain a plurality of audio signal samples;
taking a plurality of subsets from the plurality of audio signal samples; and
performing a Fourier transform on each of the plurality of subsets to obtain a set of Fourier frequency components.
-
-
5. The method according to claim 4, wherein the analyzing step further includes the sub-step of averaging together corresponding Fourier frequency components obtained from two or more successive subsets selected from the plurality of subsets.
-
6. The method according to claim 5, wherein the analyzing step further includes the sub-step of collecting Fourier frequency components into a plurality of semitone frequency bands.
-
7. The method according to claim 1, wherein the detecting step includes the sub-steps of:
-
keeping the first running average over the first averaging period of a first subset of the time dependent frequency components so as to obtain a first series of averages for the first averaging period;
keeping the second running average over the second averaging period of the first subset of the time dependent frequency components so as to obtain a second series of averages for the first averaging period; and
recording a plurality of event times, each of the event times being a time at which there occurs one of the detected events of the first running average crossing the second running average.
-
-
8. The method according to claim 1,
wherein the first averaging period is between approximately {fraction (1/10)} of a second and approximately 1 second, and the second averaging period is from approximately 2 to 8 times as long as the first averaging period. -
9. The method according to claim 1, further comprising the step of collecting the plurality of events in a plurality of time groups each of which covers an interval of time.
-
10. The method according to claim 9, further comprising the step of:
in response to detecting each event in each of the plurality of time dependent frequency components, selecting one or more combinations of events from a plurality of events that occurred within a number of time groups, and within a number of time dependent frequency components.
-
11. The method according to claim 10, wherein the selecting step includes the sub-step of selecting one or more combinations of events from a plurality of events that occurred within a number of time groups, and within a number of time dependent frequency components, taking only one event at a time from each time group.
-
12. The method according to claim 10, further comprising the step of forming a plurality of keys from the one or more combinations each of which comprises a time to be associated with the combination of events, and a key sequence including information about each event in the combination.
-
13. A method for forming an identifying feature of a portion of a recording of audio signals, said method comprising the steps of:
-
performing a Fourier transformation of the audio signals of the portion into a time series of audio power dissipated over a first plurality of frequencies;
grouping the frequencies into a smaller second plurality of bands that each include a range of neighboring frequencies;
detecting power dissipation events in each of the bands; and
grouping together the power dissipation events from mutually adjacent bands at a selected moment so as to form the identifying feature, wherein each of the power dissipation events is a crossing of the value of a first running average and the value of a second running average, the first running average is an average over a first averaging period of the audio power dissipated, and the second running average is an average over a second averaging period, which is different than the first averaging period, of the audio power dissipated. - View Dependent Claims (14, 15, 16, 17)
forming at least a first identifying feature based on the portion of the known recording and at least a second identifying feature based on a portion of the audio stream using the method of claim 13;
storing the first identifying feature in a database; and
comparing the first and second identifying features to determine whether there is at least a selected degree of similarity.
-
-
17. The method according to claim 16, wherein each of the power dissipation events is a crossover of rolling energy dissipation levels over time periods of different lengths.
-
18. A computer-readable medium encoded with a program for identifying audio content, said program containing instructions for performing the steps of:
-
obtaining an audio signal;
analyzing the power spectrum of the audio signal so as to obtain a plurality of time dependent frequency components; and
detecting a plurality of events, each of the events being a crossing of the value of a first running average and the value of a second running average, wherein the first running average is an average over a first averaging period of a first subset of the time dependent frequency components, and the second running average is an average over a second averaging period, which is different than the first averaging period, of the first subset of the time dependent frequency components. - View Dependent Claims (19, 20, 21)
detecting a set of events occurring at nearby times in adjacent frequencies bands; and
forming a key by combining at least a subset of the set of events.
-
-
20. The computer-readable medium according to claim 18, wherein the analyzing step includes the sub-steps of:
-
sampling the audio signal to obtain a plurality of audio signal samples;
taking a plurality of subsets from the plurality of audio signal samples; and
performing a Fourier transform on each of the plurality of subsets to obtain a set of Fourier frequency components.
-
-
21. The computer-readable medium according to claim 18, wherein the detecting step includes the sub-steps of:
-
keeping the first running average over the first averaging period of a first subset of the time dependent frequency components so as to obtain a first series of averages for the first averaging period;
keeping the second running average over the second averaging period of the first subset of the time dependent frequency components so as to obtain a second series of averages for the first averaging period; and
recording a plurality of event times, each of the event times being a time at which there occurs one of the detected events of the first running average crossing the second running average.
-
-
22. A computer-readable medium encoded with a program for forming an identifying feature of a portion of a recording of audio signals, said program containing instructions for performing the steps of:
-
performing a Fourier transformation of the audio signals of the portion into a time series of audio power dissipated over a first plurality of frequencies;
grouping the frequencies into a smaller second plurality of bands that each include a range of neighboring frequencies;
detecting power dissipation events in each of the bands; and
grouping together the power dissipation events from mutually adjacent bands at a selected moment so as to form the identifying feature, wherein each of the power dissipation events is a crossing of the value of a first running average and the value of a second running average, the first running average is an average over a first averaging period of the audio power dissipated, and the second running average is an average over a second averaging period, which is different than the first averaging period, of the audio power dissipated.
-
-
23. A system for identifying a recording of an audio signal, said system comprising:
-
an interface for receiving an audio signal to be identified;
a spectrum analyzer for analyzing the power spectrum of the audio signal so as to produce a plurality of time dependent frequency components from the audio signal;
an event detector for detecting a plurality of events in each of the time dependent frequency components; and
a key generator for grouping the plurality of events by frequency and time, and assembling a plurality of keys based on the plurality of events, wherein each of the events detected by the event detector is a crossing of the value of a first running average and the value of a second running average, the first running average is an average over a first averaging period of a first subset of the time dependent frequency components, and the second running average is an average over a second averaging period, which is different than the first averaging period, of the first subset of the time dependent frequency components. - View Dependent Claims (24, 25)
-
-
26. A system for forming an identifying feature of a portion of a recording of audio signals, said system comprising:
-
means for performing a Fourier transformation of the audio signals of the portion into a time series of audio power dissipated over a first plurality of frequencies;
means for grouping the frequencies into a smaller second plurality of bands that each include a range of neighboring frequencies;
means for detecting power dissipation events in each of the bands; and
means for grouping together the power dissipation events from mutually adjacent bands at a selected moment so as to form the identifying feature, wherein each of the power dissipation events detected by the means for detecting is a crossing of the value of a first running average and the value of a second running average, the first running average is an average over a first averaging period of the audio power dissipated, and the second running average is an average over a second averaging period, which is different than the first averaging period, of the audio power dissipated.
-
Specification