Music summarization system and method

US 6,633,845 B1
Filed: 04/07/2000
Issued: 10/14/2003
Est. Priority Date: 04/07/2000
Status: Expired due to Fees

First Claim

Patent Images

1. A method for producing a key phrase for a song having words and music and a plurality of elements organized into a song structure, the method comprising the steps of:

dividing at least a portion of the song into a plurality of frames;

generating a feature vector for each frame, each feature vector having a plurality of parameters whose values are characteristic of that portion of the song contained within the respective frame;

processing the feature vectors of each frame so as to identify the song'"'"'s structure;

marking those feature vectors associated with different structural elements of the song with different labels; and

applying one or more predetermined rules to the marked set of feature vectors in order to select a single occurrence of a chosen label as the key phrase of the song.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention provides a method and apparatus for automatically generating a summary or key phrase for a song. The song, or a portion thereof, is digitized and converted into a sequence of feature vectors, such mel-frequency cepstral coefficients (MFCCs). The feature vectors are then processed in order decipher the song'"'"'s structure. Those sections that correspond to different structural elements are then marked with corresponding labels. Once the song is labeled, various heuristics are applied to select a key phrase corresponding to the song'"'"'s summary. For example, the system may identify the label that appears most frequently within the song, and then select the longest duration of that label as the summary.

Citations

28 Claims

1. A method for producing a key phrase for a song having words and music and a plurality of elements organized into a song structure, the method comprising the steps of:
- dividing at least a portion of the song into a plurality of frames;
  
  generating a feature vector for each frame, each feature vector having a plurality of parameters whose values are characteristic of that portion of the song contained within the respective frame;
  
  processing the feature vectors of each frame so as to identify the song'"'"'s structure;
  
  marking those feature vectors associated with different structural elements of the song with different labels; and
  
  applying one or more predetermined rules to the marked set of feature vectors in order to select a single occurrence of a chosen label as the key phrase of the song.
- View Dependent Claims (2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 2. The method of claim 1 wherein the key phrase is appended to the song.
  - 3. The method of claim 2 wherein the chosen label corresponds to the most frequently occurring label.
  - 5. The method of claim 1 wherein the parameters of the feature vectors are mel-frequency cepstral coefficients (MFCCs).
  - 6. The method of claim 5 wherein the processing step comprises the steps of:
7. The method of claim 6 wherein the comparing step comprises the steps of:
- computing the distortion between the means and covariances of the segments;
  
  identifying the two feature vectors whose distortion is the lowest;
  
  if the lowest distortion is less than a pre-defined threshold, combining the feature vectors of the two segments into a cluster;
  
  calculating a mean and covariance for the cluster based on the feature vectors from the two segments; and
  
  repeating the steps of computing, identifying, combining and calculating until the distortion between all remaining clusters and segments, if any, is equal to or greater than the pre-defined threshold.
8. The method of claim 7 wherein the distortion computation is based upon the Kullback-Leibler (KL) distance measure, modified so as to be symmetric.
9. The method of claim 8 wherein the frames of all segments combined to form a single cluster are considered to be part of the same structural element of the song.
10. The method of claim 7 wherein the frames of all segments combined to form a single cluster are considered to be part of the same structural element of the song.
11. The method of claim 1 wherein the chosen label corresponds to the most frequently occurring label.
12. The method of claim 5 wherein the processing step comprises the steps of:
- selecting a number of connected Hidden Markov Model (HMM) states to model the song being summarized;
  
  training the HMM with at least a portion of the song being summarized; and
  
  applying the trained HMM to the song portion so as to associate each frame with a single HMM state.
13. The method of claim 12 wherein each HMM state has a corresponding set of parameters, and the step of training comprises the steps of:
- initializing the parameters of each HMM state to predetermined values; and
  
  optimizing the HMM state parameters by using the Baum-Welch re-estimation algorithm.
14. The method of claim 13 wherein each HMM state is modeled by a Gaussian Distribution, and the step of initializing comprises the steps of:
- setting a mean of each HMM state to a randomly selected value; and
  
  setting a covariance of each HMM state to a global covariance based on a covariance associated with each of the feature vectors.
15. The Method of claim 14 wherein the step of applying comprises the steps of:
- building a matrix of HMM states versus frames; and
  
  identifying a single path through the matrix having a highest probability.
16. The method of claim 15 wherein the highest probability path is identified using the Viterbi decoding algorithm.
17. The method of claim 12 wherein the frames associated with the same HMM state are considered to be part of the same structural element of the song.
18. The method of claim 12 wherein the step of applying comprises the steps of:
- building a matrix of HMM states versus frames; and
  
  identifying a single path through the matrix having a highest probability.

4. A method for producing a key phrase for a song having a plurality of elements organized into a song structure, the method comprising the steps of:
- dividing at least a portion of the song into a plurality of frames;
  
  generating a feature vector for each frame, each feature vector having a plurality of parameters whose values are characteristic of that portion of the song contained within the respective frame;
  
  processing the feature vectors of each frame so as to identify the song'"'"'s structure;
  
  marking those feature vectors associated with different structural elements of the song with different labels; and
  
  applying one or more predetermined rules to the marked set of feature vectors in order to select a single occurrence of a chosen label as the key phrase of the song, wherein the key phrase is appended to the song, the chosen label corresponds to the most frequently occurring label, and the single occurrence corresponds to at least a portion of the longest duration of the chosen label.

19. A system for producing a key phrase for a song having words and music and a plurality of elements organized into a song structure, the system comprising:
- a signal processor configured to receive a signal that corresponds to at least a portion of the song, and for dividing the song signal into a plurality of frames;
  
  a feature vector extraction engine coupled to the signal processor, the extraction engine configured to generate a feature vector for each frame, each feature vector having a plurality of parameters whose values are characteristic of that portion of the song signal contained within respective frame;
  
  a labeling engine coupled to the feature vector extraction engine, the labeling engine configured to process the feature vectors so as to identify the song'"'"'s structure, and to mark those feature vectors associated with different structural elements of the song with different labels; and
  
  a key phrase identifier logic coupled to the labeling engine, the identifier logic configured to apply one or more predetermined rules to the marked set of feature vectors in order to select a single occurrence of a chosen label as the key phrase of the song.
- View Dependent Claims (20, 21)
- - 20. The system of claim 19 wherein the key phrase is appended to the song.
  - 21. The system of claim 19 wherein the chosen label corresponds to the most frequently occurring label.

22. A computer readable medium containing program instructions for producing a key phrase for a song having words and music and a plurality of elements organized into a song structure, the executable program instructions comprising program instructions for:
- dividing at least a portion of the song into a plurality of frames;
  
  generating a feature vector for each frame, each feature vector having a plurality of parameters whose values are characteristic of that portion of the song contained within the respective frame;
  
  processing the feature vectors of each frame so as to identify the song'"'"'s structure;
  
  marking those feature vectors associated with different structural elements of the song with different labels; and
  
  applying one or more predetermined rules to the marked set of feature vectors in order to select a single occurrence of a chosen label as the key phrase of the song.
- View Dependent Claims (23, 24)
- - 23. The computer readable medium of claim 22 wherein the program instructions for processing comprise program instructions for:
24. The computer readable medium of claim 22 wherein the program instructions for processing comprise program instructions for:
- selecting a number of connected Hidden Markov Model (HMM) states to model the song being summarized;
  
  training the HMM with at least a portion of the song being summarized; and
  
  applying the trained HMM to the song portion so as to associate each frame with a single HMM state.

25. A method for producing a key phrase for a musical piece having a plurality of elements organized into a structure, the method comprising the steps of:
- dividing at least a portion of the musical piece into a plurality of frames;
  
  generating a feature vector for each frame, each feature vector having a plurality of parameters whose values are characteristic of that portion of the musical piece contained within the respective frame;
  
  processing the feature vectors of each frame so as to identify the musical piece'"'"'s structure;
  
  marking those feature vectors associated with different structural elements of the musical piece with different labels; and
  
  applying one or more predetermined rules to the marked set of feature vectors in order to select a single occurrence of a chosen label as the key phrase of the musical piece.
- View Dependent Claims (26)
- - 26. The method of claim 25 wherein the musical piece is one of a song having words and music and an instrumental having music but being free of words.

27. A system for producing a key phrase for a musical piece having a plurality of elements organized into a structure, the system comprising:
- a signal processor configured to receive a signal that corresponds to at least a portion of the musical piece, and for dividing the musical piece into a plurality of frames;
  
  a feature vector extraction engine coupled to the signal processor, the extraction engine configured to generate a feature vector for each frame, each feature vector having a plurality of parameters whose values are characteristic of that portion of the musical piece signal contained within respective frame;
  
  a labeling engine coupled to the feature vector extraction engine, the labeling engine configured to process the feature vectors so as to identify the musical piece'"'"'s structure, and to mark those feature vectors associated with different structural elements of the musical piece with different labels; and
  
  a key phrase identifier logic coupled to the labeling engine, the identifier logic configured to apply one or more predetermined rules to the marked set of feature vectors in order to select a single occurrence of a chosen label as the key phrase of the musical piece.
- View Dependent Claims (28)
- - 28. The system of claim 27 wherein the musical piece is one of a song having words and music and an instrumental having music but being free of words.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Compaq Information Technologies Group LP (HP Inc.)
Original Assignee
Hewlett-Packard Development Company, L.P. (HP Inc.)
Inventors
Chu, Stephen Mingyu, Logan, Beth Teresa
Primary Examiner(s)
To, Doris H.
Assistant Examiner(s)
Nolan, Daniel A.

Application Number

US09/545,893
Time in Patent Office

1,285 Days
Field of Search

704/231-245, 704/251-256, 704/272, 846/12--16, 846/09--11, 846/49, 345/840, 700/214, 400/116
US Class Current

704/255
CPC Class Codes

G10H 1/0008   Associated control or indic...

G10H 2210/041   based on mfcc [mel -frequen...

G10H 2210/061   for extraction of musical p...

G10H 2240/135   Library retrieval index, i....

G10H 2250/015   Markov chains, e.g. hidden ...

G10H 2250/235   Fourier transform; Discrete...

G10H 2250/281   Hamming window

Music summarization system and method

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

28 Claims

Specification

Solutions

Use Cases

Quick Links

Music summarization system and method

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

28 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links