Intervalgram representation of audio for melody recognition
First Claim
Patent Images
1. A computer-implemented method for matching audio clips, the method executed by a computer system, and comprising:
- receiving an audio chromagram representing an input audio clip, the audio chromagram comprising a sequence of vectors, each vector in the audio chromagram corresponding to a different time point of the input audio clip, and each vector representing a distribution of audio pitches at the corresponding time point of the input audio clip;
selecting a sampling of different reference time points within the audio chromagram;
for each of the selected reference time points, generating a chroma block having a plurality of vectors, each of the plurality of vectors in the chroma block corresponding to a different time sub-period of the input audio clip, and each vector representing a weighted average of distributions of audio pitches within the corresponding time sub-period;
for each of the selected reference time points, generating a reference vector representing a reference distribution of pitches for the selected reference time point in the audio chromagram;
applying a circular cross-correlation of the vectors of each chroma block against the reference vector to produce a sequence of intervalgram blocks for the input audio clip, wherein the sequence of intervalgram blocks comprises an intervalgram representation for the input audio clip, each intervalgram block associated with a different time period within the input audio clip, and each intervalgram block representing a distribution of pitch intervals occurring between different sub-periods within the time period;
comparing the intervalgram representation for the input audio clip to stored intervalgram representations corresponding to reference audio clips in a reference database;
selecting a reference audio clip from the reference database having an intervalgram representation best matching the intervalgram representation for the input audio clip; and
generating a recognition result indicative of the selected reference audio clip.
2 Assignments
0 Petitions
Accused Products
Abstract
A system, method, and computer readable storage medium generates an audio fingerprint for an input audio clip that is robust to differences in key, instrumentation, and other performance variations. The audio fingerprint includes a sequence of intervalgrams that represent a melody in an audio clip according pitch intervals between different time points in the audio clip. The fingerprint for an input audio clip can be compared to a set of reference fingerprints in a reference database to determine a matching reference audio clip.
-
Citations
18 Claims
-
1. A computer-implemented method for matching audio clips, the method executed by a computer system, and comprising:
-
receiving an audio chromagram representing an input audio clip, the audio chromagram comprising a sequence of vectors, each vector in the audio chromagram corresponding to a different time point of the input audio clip, and each vector representing a distribution of audio pitches at the corresponding time point of the input audio clip; selecting a sampling of different reference time points within the audio chromagram; for each of the selected reference time points, generating a chroma block having a plurality of vectors, each of the plurality of vectors in the chroma block corresponding to a different time sub-period of the input audio clip, and each vector representing a weighted average of distributions of audio pitches within the corresponding time sub-period; for each of the selected reference time points, generating a reference vector representing a reference distribution of pitches for the selected reference time point in the audio chromagram; applying a circular cross-correlation of the vectors of each chroma block against the reference vector to produce a sequence of intervalgram blocks for the input audio clip, wherein the sequence of intervalgram blocks comprises an intervalgram representation for the input audio clip, each intervalgram block associated with a different time period within the input audio clip, and each intervalgram block representing a distribution of pitch intervals occurring between different sub-periods within the time period; comparing the intervalgram representation for the input audio clip to stored intervalgram representations corresponding to reference audio clips in a reference database; selecting a reference audio clip from the reference database having an intervalgram representation best matching the intervalgram representation for the input audio clip; and generating a recognition result indicative of the selected reference audio clip. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer-implemented method for generating a reference database of audio fingerprints representative of melodies in a corresponding set of reference audio clips, the method executed by a computer system, and comprising:
-
receiving a reference audio clip; generating an audio chromagram representing the reference audio clip, the audio chromagram comprising a sequence of vectors, each vector in them audio chromagram corresponding to a different time point of the reference audio clip, and each vector representing a distribution of audio pitches at the corresponding time point of the reference audio clip; selecting a set of different reference time points within the audio chromagram; for each of the selected reference time points, generating a chroma block having a plurality of vectors, each of the plurality of vectors in the chroma block corresponding to a different time sub-period of the reference audio clip, and each vector representing a weighted average of distributions of audio pitches within the corresponding time sub-period; for each of the selected reference time points, generating a reference vector representing a reference distribution of pitches for the selected reference time point in the audio chromagram; applying a circular cross-correlation of the vectors of each chroma block against the reference vector to produce a sequence of intervalgram blocks for the reference audio clip, wherein the sequence of intervalgram blocks comprises an intervalgram representation for the reference audio clip, each intervalgram block associated with a different time period within the reference audio clip, and each intervalgram block representing a distribution of pitch intervals occurring between different sub-periods within the time period; and storing the intervalgram representation as a reference fingerprint in the reference database. - View Dependent Claims (9)
-
-
10. A non-transitory computer readable storage medium storing computer-executable program instructions for matching audio clips, the program instructions when executed cause a processor to perform steps of:
-
receiving an audio chromagram representing an input audio clip, the audio chromagram comprising a sequence of vectors, each vector in the audio chromagram corresponding to a different time point of the input audio clip, and each vector representing a distribution of audio pitches at the corresponding time point of the input audio clip; selecting a sampling of different reference time points within the audio chromagram; for each of the selected reference time points, generating a chroma block having a plurality of vectors, each of the plurality of vectors in the chroma block corresponding to a different time sub-period of the input audio clip, and each vector representing a weighted average of distributions of audio pitches within the corresponding time sub-period; for each of the selected reference time points, generating a reference vector representing a reference distribution of pitches for the selected reference time point in the audio chromagram; applying a circular cross-correlation of the vectors of each chroma block against the reference vector to produce a sequence of intervalgram blocks for the input audio clip, wherein the sequence of intervalgram blocks comprises an intervalgram representation for the input audio clip, each intervalgram block associated with a different time period within the input audio clip, and each intervalgram block representing a distribution of pitch intervals occurring between different sub-periods within the time period; comparing the intervalgram representation for the input audio clip to stored intervalgram representations corresponding to reference audio clips in a reference database; selecting a reference audio clip from the reference database having an intervalgram representation best matching the intervalgram representation for the input audio clip; and generating a recognition result indicative of the selected reference audio clip. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. The computer-readable storage medium 10, further comprising storing the intervalgram representation for the input audio clip to the reference database as an additional reference fingerprint.
-
17. A non-transitory computer readable storage medium storing computer-executable program instructions for generating a reference database of audio fingerprints representative of melodies in a corresponding set of reference audio clip, the program instructions when executed cause a processor to perform steps of:
-
receiving a reference audio clip; generating an audio chromagram representing the reference audio clip, the audio chromagram comprising a sequence of vectors, each vector in them audio chromagram corresponding to a different time point of the reference audio clip, and each vector representing a distribution of audio pitches at the corresponding time point of the reference audio clip; selecting a set of different reference time points within the audio chromagram; for each of the selected reference time points, generating a chroma block having a plurality of vectors, each of the plurality of vectors in the chroma block corresponding to a different time sub-period of the reference audio clip, and each vector representing a weighted average of distributions of audio pitches within the corresponding time sub-period; for each of the selected reference time points, generating a reference vector representing a reference distribution of pitches for the selected reference time point in the audio chromagram; applying a circular cross-correlation of the vectors of each chroma block against the reference vector to produce a sequence of intervalgram blocks for the reference audio clip, wherein the sequence of intervalgram blocks comprises an intervalgram representation for the reference audio clip, each intervalgram block associated with a different time period within the reference audio clip, and each intervalgram block representing a distribution of pitch intervals occurring between different sub-periods within the time period; and storing the intervalgram representation as a reference fingerprint in the reference database. - View Dependent Claims (18)
-
Specification