Feature extraction for identification and classification of audio signals

US 8,140,331 B2
Filed: 07/04/2008
Issued: 03/20/2012
Est. Priority Date: 07/06/2007
Status: Active Grant

First Claim

Patent Images

1. A method for extracting audio features from a plurality of audio frames to classify said plurality of audio frames into acoustically similar groups, the method comprising:

transforming each audio frame of said plurality of audio frames into a plurality of frequency sub-bands to obtain a transformation result;

Storing the transformation results for the plurality of audio frames as sub-band coefficients in S, wherein each sub-band coefficient correlates to an audio frame at a time slice with a frequency sub-band;

for each time slice of S, subtracting each of the sub-band coefficient within the same time slice from a mean value corresponding to the same time slice to obtain a set of new sub-band coefficient;

storing the sets of new sub-band coefficients for all the time slices in Z;

calculating a beat matrix from Z and storing the beat matrix in B, wherein a first axis of B corresponding to a time slice and a second axis of B corresponding to a frequency sub-band, and wherein each nonzero entry in B corresponds to a beat onset of the audio frames at each frequency sub-band;

calculating a plurality of quantized coefficients from Z according to at least one quantization threshold and storing the quantized coefficients in A;

calculating a plurality of intra-band features from B, wherein the intra-band features correlate to beat signatures of the audio frames at a frequency sub-band; and

calculating a plurality of inter-band features from A, wherein the inter-band features correlate to changes among the frequency sub-bands of the audio frames.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Characteristic features are extracted from an audio sample based on its acoustic content. The features can be coded as fingerprints, which can be used to identify the audio from a fingerprints database. The features can also be used as parameters to separate the audio into different categories.

109 Citations

View as Search Results

20 Claims

1. A method for extracting audio features from a plurality of audio frames to classify said plurality of audio frames into acoustically similar groups, the method comprising:
- transforming each audio frame of said plurality of audio frames into a plurality of frequency sub-bands to obtain a transformation result;
  
  Storing the transformation results for the plurality of audio frames as sub-band coefficients in S, wherein each sub-band coefficient correlates to an audio frame at a time slice with a frequency sub-band;
  
  for each time slice of S, subtracting each of the sub-band coefficient within the same time slice from a mean value corresponding to the same time slice to obtain a set of new sub-band coefficient;
  
  storing the sets of new sub-band coefficients for all the time slices in Z;
  
  calculating a beat matrix from Z and storing the beat matrix in B, wherein a first axis of B corresponding to a time slice and a second axis of B corresponding to a frequency sub-band, and wherein each nonzero entry in B corresponds to a beat onset of the audio frames at each frequency sub-band;
  
  calculating a plurality of quantized coefficients from Z according to at least one quantization threshold and storing the quantized coefficients in A;
  
  calculating a plurality of intra-band features from B, wherein the intra-band features correlate to beat signatures of the audio frames at a frequency sub-band; and
  
  calculating a plurality of inter-band features from A, wherein the inter-band features correlate to changes among the frequency sub-bands of the audio frames.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein calculating a plurality of inter-band features from A comprises:
    - calculating a plurality of zero crossing rates, linear prediction coefficients, means, variances, derivatives and second order moments of A; and
      
      grouping the above said calculating results into a plurality of inter-band features.
  - 3. The method of claim 1, wherein calculating a plurality of intra-band features from B comprises:
    - calculating a plurality of zero crossing rates, linear prediction coefficients, means, variances, derivatives and second order moments of B; and
      
      grouping the above said calculating results into a plurality of intra-band features.
  - 4. The method of claim 1, wherein the sub-band coefficients in S are in logarithmic scale.
  - 5. The method of claim 1, wherein transforming each audio frame of said plurality of audio frames into a plurality of frequency sub-bands comprises performing a short-time Fast Fourier Transform.
  - 6. The method of claim 1, wherein transforming each audio frame of said plurality of audio frames into a plurality of frequency sub-bands comprises applying filter banks to the audio frame.
  - 7. The method of claim 1, wherein Z is a matrix with a first axis corresponding to a time slice and a second axis corresponding to a frequency sub-band, and wherein calculating a beat matrix from Z comprises differential encoding each coefficient in Z against a beat threshold, wherein the beat threshold is derived from a plurality of sequential coefficients within a same frequency sub-band of Z, and wherein the said deriving comprises:
    - filtering the plurality of sequential coefficients to generate a filter output, wherein the filter impulse response tapers; and
      
      assigning a beat weight to the filter output to derive the threshold.
  - 8. The method of claim 1, wherein Z is a matrix with a first axis corresponding to a time slice and a second axis corresponding to a frequency sub-band, and wherein the at least one quantization threshold is derived from a plurality of coefficients of Z within a same time slice, and wherein the said deriving comprises:
    - for each time slice in Z, calculating a sum of all the sub-band coefficients within the same time slice to obtain a plurality of total band energy and storing the plurality of total band energy in vector V;
      
      filtering a plurality of sequential elements of V to generate a filtered output, wherein the filter impulse response tapers; and
      
      applying a set of weighting coefficients to the filtered output to derive the at least one quantization threshold.
  - 9. The method of claim 1, further comprising grouping the inter-band features and the intra-band features into a dataset in sequential order and applying a Support Vector Machines (SVMs) to the dataset to classify the audio frames into acoustically similar groups.

10. A method for extracting an audio fingerprint from a plurality of audio frames, the method comprising:
- transforming each audio frame of said plurality of audio frames into a plurality of frequency sub-bands to obtain a transformation result;
  
  storing the transformation results for the plurality of audio frames as sub-band coefficients in S, wherein each sub-band coefficient correlates to an audio frame at a time slice with a frequency sub-band;
  
  for each time slice of S, subtracting a mean of elements of the time slice from each sub-band coefficient within the time slice to obtain a set of new sub-band coefficients for the time slice;
  
  storing sets of new sub-band coefficients for all the time slices in Z;
  
  calculating a beat matrix from Z and storing the beat matrix in B, wherein each nonzero entry in B corresponds to a beat onset of the audio frames at each frequency sub-band;
  
  calculating a plurality of quantized coefficients from Z according to at least one quantization threshold and storing the quantized coefficients in A;
  
  calculating the audio fingerprint from A and B, wherein A and B are matrices with a first axis corresponding to a time slice and a second axis corresponding to a frequency sub-band, wherein A and B have the same number of rows and columns, and the calculating comprises;
  
  multiplying each element in A with a corresponding element in B to generate a matrix C; and
  
  concatenating all elements in C into a sequence of bits to generate the fingerprint.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
- - 11. A method for identifying a plurality of unknown audio frames by employing audio fingerprints of the audio frames, the method comprising:
    - calculating a set of audio fingerprints for the said audio frames according to the method in claim 10;
      
      obtaining a set of same length audio fingerprints from an audio fingerprints database, wherein the fingerprints in the said database are calculated according to the method of claim 10; and
      
      comparing the said two sets of fingerprints to determine the similarity.
  - 12. The method of claim 10, wherein transforming each audio frame of said plurality of audio frames into a plurality of frequency sub-bands comprises performing a short-time Fast Fourier Transform.
  - 13. The method of claim 10, wherein transforming each audio frame of said plurality of audio frames into a plurality of frequency sub-bands comprises applying filter banks to the audio frame.
  - 14. The method of claim 10, wherein Z is a matrix with a first axis corresponding to a time slice and a second axis corresponding to a frequency sub-band, and wherein calculating a beat matrix from Z comprises differential encoding each coefficient in Z against a beat threshold, wherein the beat threshold is derived from a plurality of sequential coefficients within a same frequency sub-band of Z, and wherein the said deriving comprises:
    - filtering the plurality of sequential coefficients to generate a filter output, wherein the filter impulse response tapers; and
      
      assigning a beat weight to the filter output to derive the threshold.
  - 15. The method of claim 10, wherein Z is a matrix with a first axis corresponding to a time slice and a second axis corresponding to a frequency sub-band, wherein the at least one quantization threshold is derived from a plurality of coefficients of Z within a same time slice, and wherein the said deriving comprises:
    - for each time slice in Z, calculating a sum of all the sub-band coefficients within the same time slice to obtain a plurality of total band energy and storing the plurality of total band energy in vector V;
      
      filtering a plurality of sequential elements of V to generate a filtered output, wherein the filter impulse response tapers; and
      
      applying a set of weighting coefficients to the filtered output to derive the at least one quantization threshold.
  - 16. The method as in claim 11, wherein said comparing is based on bit error rate (BER), wherein the positive identification is found when the BER is less than a threshold.
  - 17. The method as in claim 11, wherein said comparing is based on Hamming distance, wherein the positive identification is found when the Hamming distance is less than a threshold.

18. A system for extracting audio features from a plurality of audio frames, the system comprising:
- one or more storage units to store the audio features; and
  
  one or more processors with non-transitory processor readable medium bearing processor executable program instructions, wherein the one or more processors are communicatively connected to the one or more storage units, and wherein the processor is operable to carry out the executable program instructions to perform the method comprising;
  
  transforming each audio frame of said plurality of audio frames into a plurality of frequency sub-bands to obtain a transformation result;
  
  Storing the transformation results for the plurality of audio frames as sub-band coefficients in S, wherein each sub-band coefficient correlates to an audio frame at a time slice with a frequency sub-band;
  
  for each time slice of S, subtracting each of the sub-band coefficient within the same time slice from a mean value corresponding to the same time slice to obtain a set of new sub-band coefficient;
  
  storing the sets of new sub-band coefficients for all the time slices in Z;
  
  calculating a plurality of beat coefficients from Z and storing the beat coefficients in B, wherein each nonzero coefficient in B corresponds to a beat onset of the audio frames at each frequency sub-band;
  
  calculating a plurality of quantized coefficients from Z according to at least one quantization threshold and storing the quantized coefficients in A;
  
  calculating a plurality of inter-band features from B, wherein the inter-band features correlate to beat signatures of the audio frames at a frequency sub-band;
  
  calculating a plurality of intra-band features from A, wherein the intra-band features correlate to changes among the frequency sub-bands of the audio frames;
  
  multiplying each element in A with a corresponding element in B to generate a sequence of bits as an audio fingerprint for the plurality of audio frames;
  
  storing the audio fingerprint for the plurality of audio frames into the one or more storage units; and
  
  applying Support Vector Machines (SVMs) to the plurality of intra-band features and the plurality of inter-band features to classify the audio frames into acoustically similar groups and storing the output of the SVMs to the one or more storage units.
- View Dependent Claims (19, 20)
- - 19. The system of claim 18, wherein the one or more storage units are to store a plurality of reference fingerprint of a plurality of reference audio frames, and wherein the processor is to identify the unidentified audio frames based on a comparison of the candidate fingerprints to the reference fingerprints.
  - 20. The system of claim 18, wherein transforming each audio frame of said plurality of audio frames into a plurality of frequency sub-bands comprises performing a short-time Fast Fourier Transform.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Xia Lou
Original Assignee
Xia Lou
Inventors
Lou, Xia
Primary Examiner(s)
Smits, Talivaldis Ivars
Assistant Examiner(s)
Sirjani, Fariba

Application Number

US12/168,103
Publication Number

US 20090012638A1
Time in Patent Office

1,355 Days
Field of Search

704/205, 704/206, 704/230, 704/231, 704/243, 704/246, 704/251, 704/500
US Class Current

704/243
CPC Class Codes

G10L 15/02 Feature extraction for spee...

G11B 27/28 by using information signal...

Feature extraction for identification and classification of audio signals

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

109 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Feature extraction for identification and classification of audio signals

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

109 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links