Music summarization system and method
First Claim
1. A method for producing a key phrase for a song having words and music and a plurality of elements organized into a song structure, the method comprising the steps of:
- dividing at least a portion of the song into a plurality of frames;
generating a feature vector for each frame, each feature vector having a plurality of parameters whose values are characteristic of that portion of the song contained within the respective frame;
processing the feature vectors of each frame so as to identify the song'"'"'s structure;
marking those feature vectors associated with different structural elements of the song with different labels; and
applying one or more predetermined rules to the marked set of feature vectors in order to select a single occurrence of a chosen label as the key phrase of the song.
3 Assignments
0 Petitions
Accused Products
Abstract
The invention provides a method and apparatus for automatically generating a summary or key phrase for a song. The song, or a portion thereof, is digitized and converted into a sequence of feature vectors, such mel-frequency cepstral coefficients (MFCCs). The feature vectors are then processed in order decipher the song'"'"'s structure. Those sections that correspond to different structural elements are then marked with corresponding labels. Once the song is labeled, various heuristics are applied to select a key phrase corresponding to the song'"'"'s summary. For example, the system may identify the label that appears most frequently within the song, and then select the longest duration of that label as the summary.
-
Citations
28 Claims
-
1. A method for producing a key phrase for a song having words and music and a plurality of elements organized into a song structure, the method comprising the steps of:
-
dividing at least a portion of the song into a plurality of frames;
generating a feature vector for each frame, each feature vector having a plurality of parameters whose values are characteristic of that portion of the song contained within the respective frame;
processing the feature vectors of each frame so as to identify the song'"'"'s structure;
marking those feature vectors associated with different structural elements of the song with different labels; and
applying one or more predetermined rules to the marked set of feature vectors in order to select a single occurrence of a chosen label as the key phrase of the song. - View Dependent Claims (2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
combining the feature vectors of a predetermined number of contiguous frames into corresponding segments;
calculating a mean and a covariance for a Gaussian Distribution model of each segment;
comparing the respective means and covariances of the segments; and
grouping together those segments whose respective means and covariances are similar, thereby revealing the song'"'"'s structure.
-
-
7. The method of claim 6 wherein the comparing step comprises the steps of:
-
computing the distortion between the means and covariances of the segments;
identifying the two feature vectors whose distortion is the lowest;
if the lowest distortion is less than a pre-defined threshold, combining the feature vectors of the two segments into a cluster;
calculating a mean and covariance for the cluster based on the feature vectors from the two segments; and
repeating the steps of computing, identifying, combining and calculating until the distortion between all remaining clusters and segments, if any, is equal to or greater than the pre-defined threshold.
-
-
8. The method of claim 7 wherein the distortion computation is based upon the Kullback-Leibler (KL) distance measure, modified so as to be symmetric.
-
9. The method of claim 8 wherein the frames of all segments combined to form a single cluster are considered to be part of the same structural element of the song.
-
10. The method of claim 7 wherein the frames of all segments combined to form a single cluster are considered to be part of the same structural element of the song.
-
11. The method of claim 1 wherein the chosen label corresponds to the most frequently occurring label.
-
12. The method of claim 5 wherein the processing step comprises the steps of:
-
selecting a number of connected Hidden Markov Model (HMM) states to model the song being summarized;
training the HMM with at least a portion of the song being summarized; and
applying the trained HMM to the song portion so as to associate each frame with a single HMM state.
-
-
13. The method of claim 12 wherein each HMM state has a corresponding set of parameters, and the step of training comprises the steps of:
-
initializing the parameters of each HMM state to predetermined values; and
optimizing the HMM state parameters by using the Baum-Welch re-estimation algorithm.
-
-
14. The method of claim 13 wherein each HMM state is modeled by a Gaussian Distribution, and the step of initializing comprises the steps of:
-
setting a mean of each HMM state to a randomly selected value; and
setting a covariance of each HMM state to a global covariance based on a covariance associated with each of the feature vectors.
-
-
15. The Method of claim 14 wherein the step of applying comprises the steps of:
-
building a matrix of HMM states versus frames; and
identifying a single path through the matrix having a highest probability.
-
-
16. The method of claim 15 wherein the highest probability path is identified using the Viterbi decoding algorithm.
-
17. The method of claim 12 wherein the frames associated with the same HMM state are considered to be part of the same structural element of the song.
-
18. The method of claim 12 wherein the step of applying comprises the steps of:
-
building a matrix of HMM states versus frames; and
identifying a single path through the matrix having a highest probability.
-
-
4. A method for producing a key phrase for a song having a plurality of elements organized into a song structure, the method comprising the steps of:
-
dividing at least a portion of the song into a plurality of frames;
generating a feature vector for each frame, each feature vector having a plurality of parameters whose values are characteristic of that portion of the song contained within the respective frame;
processing the feature vectors of each frame so as to identify the song'"'"'s structure;
marking those feature vectors associated with different structural elements of the song with different labels; and
applying one or more predetermined rules to the marked set of feature vectors in order to select a single occurrence of a chosen label as the key phrase of the song, wherein the key phrase is appended to the song, the chosen label corresponds to the most frequently occurring label, and the single occurrence corresponds to at least a portion of the longest duration of the chosen label.
-
-
19. A system for producing a key phrase for a song having words and music and a plurality of elements organized into a song structure, the system comprising:
-
a signal processor configured to receive a signal that corresponds to at least a portion of the song, and for dividing the song signal into a plurality of frames;
a feature vector extraction engine coupled to the signal processor, the extraction engine configured to generate a feature vector for each frame, each feature vector having a plurality of parameters whose values are characteristic of that portion of the song signal contained within respective frame;
a labeling engine coupled to the feature vector extraction engine, the labeling engine configured to process the feature vectors so as to identify the song'"'"'s structure, and to mark those feature vectors associated with different structural elements of the song with different labels; and
a key phrase identifier logic coupled to the labeling engine, the identifier logic configured to apply one or more predetermined rules to the marked set of feature vectors in order to select a single occurrence of a chosen label as the key phrase of the song. - View Dependent Claims (20, 21)
-
-
22. A computer readable medium containing program instructions for producing a key phrase for a song having words and music and a plurality of elements organized into a song structure, the executable program instructions comprising program instructions for:
-
dividing at least a portion of the song into a plurality of frames;
generating a feature vector for each frame, each feature vector having a plurality of parameters whose values are characteristic of that portion of the song contained within the respective frame;
processing the feature vectors of each frame so as to identify the song'"'"'s structure;
marking those feature vectors associated with different structural elements of the song with different labels; and
applying one or more predetermined rules to the marked set of feature vectors in order to select a single occurrence of a chosen label as the key phrase of the song. - View Dependent Claims (23, 24)
combining the feature vectors of a predetermined number of contiguous frames into corresponding segments;
calculating a mean and a covariance for a Gaussian Distribution model of each segment;
comparing the respective means and covariances of the segments; and
grouping together those segments whose respective means and covariances are similar, thereby revealing the song'"'"'s structure.
-
-
24. The computer readable medium of claim 22 wherein the program instructions for processing comprise program instructions for:
-
selecting a number of connected Hidden Markov Model (HMM) states to model the song being summarized;
training the HMM with at least a portion of the song being summarized; and
applying the trained HMM to the song portion so as to associate each frame with a single HMM state.
-
-
25. A method for producing a key phrase for a musical piece having a plurality of elements organized into a structure, the method comprising the steps of:
-
dividing at least a portion of the musical piece into a plurality of frames;
generating a feature vector for each frame, each feature vector having a plurality of parameters whose values are characteristic of that portion of the musical piece contained within the respective frame;
processing the feature vectors of each frame so as to identify the musical piece'"'"'s structure;
marking those feature vectors associated with different structural elements of the musical piece with different labels; and
applying one or more predetermined rules to the marked set of feature vectors in order to select a single occurrence of a chosen label as the key phrase of the musical piece. - View Dependent Claims (26)
-
-
27. A system for producing a key phrase for a musical piece having a plurality of elements organized into a structure, the system comprising:
-
a signal processor configured to receive a signal that corresponds to at least a portion of the musical piece, and for dividing the musical piece into a plurality of frames;
a feature vector extraction engine coupled to the signal processor, the extraction engine configured to generate a feature vector for each frame, each feature vector having a plurality of parameters whose values are characteristic of that portion of the musical piece signal contained within respective frame;
a labeling engine coupled to the feature vector extraction engine, the labeling engine configured to process the feature vectors so as to identify the musical piece'"'"'s structure, and to mark those feature vectors associated with different structural elements of the musical piece with different labels; and
a key phrase identifier logic coupled to the labeling engine, the identifier logic configured to apply one or more predetermined rules to the marked set of feature vectors in order to select a single occurrence of a chosen label as the key phrase of the musical piece. - View Dependent Claims (28)
-
Specification