Feature extraction for identification and classification of audio signals
First Claim
Patent Images
1. A method for extracting audio features from a plurality of audio frames to classify said plurality of audio frames into acoustically similar groups, the method comprising:
- transforming each audio frame of said plurality of audio frames into a plurality of frequency sub-bands to obtain a transformation result;
Storing the transformation results for the plurality of audio frames as sub-band coefficients in S, wherein each sub-band coefficient correlates to an audio frame at a time slice with a frequency sub-band;
for each time slice of S, subtracting each of the sub-band coefficient within the same time slice from a mean value corresponding to the same time slice to obtain a set of new sub-band coefficient;
storing the sets of new sub-band coefficients for all the time slices in Z;
calculating a beat matrix from Z and storing the beat matrix in B, wherein a first axis of B corresponding to a time slice and a second axis of B corresponding to a frequency sub-band, and wherein each nonzero entry in B corresponds to a beat onset of the audio frames at each frequency sub-band;
calculating a plurality of quantized coefficients from Z according to at least one quantization threshold and storing the quantized coefficients in A;
calculating a plurality of intra-band features from B, wherein the intra-band features correlate to beat signatures of the audio frames at a frequency sub-band; and
calculating a plurality of inter-band features from A, wherein the inter-band features correlate to changes among the frequency sub-bands of the audio frames.
0 Assignments
0 Petitions
Accused Products
Abstract
Characteristic features are extracted from an audio sample based on its acoustic content. The features can be coded as fingerprints, which can be used to identify the audio from a fingerprints database. The features can also be used as parameters to separate the audio into different categories.
109 Citations
20 Claims
-
1. A method for extracting audio features from a plurality of audio frames to classify said plurality of audio frames into acoustically similar groups, the method comprising:
-
transforming each audio frame of said plurality of audio frames into a plurality of frequency sub-bands to obtain a transformation result; Storing the transformation results for the plurality of audio frames as sub-band coefficients in S, wherein each sub-band coefficient correlates to an audio frame at a time slice with a frequency sub-band; for each time slice of S, subtracting each of the sub-band coefficient within the same time slice from a mean value corresponding to the same time slice to obtain a set of new sub-band coefficient; storing the sets of new sub-band coefficients for all the time slices in Z; calculating a beat matrix from Z and storing the beat matrix in B, wherein a first axis of B corresponding to a time slice and a second axis of B corresponding to a frequency sub-band, and wherein each nonzero entry in B corresponds to a beat onset of the audio frames at each frequency sub-band; calculating a plurality of quantized coefficients from Z according to at least one quantization threshold and storing the quantized coefficients in A; calculating a plurality of intra-band features from B, wherein the intra-band features correlate to beat signatures of the audio frames at a frequency sub-band; and calculating a plurality of inter-band features from A, wherein the inter-band features correlate to changes among the frequency sub-bands of the audio frames. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method for extracting an audio fingerprint from a plurality of audio frames, the method comprising:
-
transforming each audio frame of said plurality of audio frames into a plurality of frequency sub-bands to obtain a transformation result; storing the transformation results for the plurality of audio frames as sub-band coefficients in S, wherein each sub-band coefficient correlates to an audio frame at a time slice with a frequency sub-band; for each time slice of S, subtracting a mean of elements of the time slice from each sub-band coefficient within the time slice to obtain a set of new sub-band coefficients for the time slice; storing sets of new sub-band coefficients for all the time slices in Z; calculating a beat matrix from Z and storing the beat matrix in B, wherein each nonzero entry in B corresponds to a beat onset of the audio frames at each frequency sub-band; calculating a plurality of quantized coefficients from Z according to at least one quantization threshold and storing the quantized coefficients in A; calculating the audio fingerprint from A and B, wherein A and B are matrices with a first axis corresponding to a time slice and a second axis corresponding to a frequency sub-band, wherein A and B have the same number of rows and columns, and the calculating comprises; multiplying each element in A with a corresponding element in B to generate a matrix C; and concatenating all elements in C into a sequence of bits to generate the fingerprint. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
-
18. A system for extracting audio features from a plurality of audio frames, the system comprising:
-
one or more storage units to store the audio features; and one or more processors with non-transitory processor readable medium bearing processor executable program instructions, wherein the one or more processors are communicatively connected to the one or more storage units, and wherein the processor is operable to carry out the executable program instructions to perform the method comprising; transforming each audio frame of said plurality of audio frames into a plurality of frequency sub-bands to obtain a transformation result; Storing the transformation results for the plurality of audio frames as sub-band coefficients in S, wherein each sub-band coefficient correlates to an audio frame at a time slice with a frequency sub-band; for each time slice of S, subtracting each of the sub-band coefficient within the same time slice from a mean value corresponding to the same time slice to obtain a set of new sub-band coefficient; storing the sets of new sub-band coefficients for all the time slices in Z; calculating a plurality of beat coefficients from Z and storing the beat coefficients in B, wherein each nonzero coefficient in B corresponds to a beat onset of the audio frames at each frequency sub-band; calculating a plurality of quantized coefficients from Z according to at least one quantization threshold and storing the quantized coefficients in A; calculating a plurality of inter-band features from B, wherein the inter-band features correlate to beat signatures of the audio frames at a frequency sub-band; calculating a plurality of intra-band features from A, wherein the intra-band features correlate to changes among the frequency sub-bands of the audio frames; multiplying each element in A with a corresponding element in B to generate a sequence of bits as an audio fingerprint for the plurality of audio frames; storing the audio fingerprint for the plurality of audio frames into the one or more storage units; and applying Support Vector Machines (SVMs) to the plurality of intra-band features and the plurality of inter-band features to classify the audio frames into acoustically similar groups and storing the output of the SVMs to the one or more storage units. - View Dependent Claims (19, 20)
-
Specification