Extracting classifying data in music from an audio bitstream
First Claim
1. A method of extracting classifying data from an audio signal, the method comprising the steps of:
- (a) processing said audio signal into a perceptual representation of its constituent frequencies;
(b) processing said perceptual representation into at least one learning representation of said audio data stream;
(c) inputting at least one said learning representation into a multi-stage classifier, whereby said multi-stage classifier extracts classifying data from said learning representations and outputs the classification of said audio signal.
3 Assignments
0 Petitions
Accused Products
Abstract
The method of the present invention utilizes machine-learning techniques, particularly Support Vector Machines in combination with a neural network, to process a unique machine-learning enabled representation of the audio bitstream. Using this method, a classifying machine is able to autonomously detect characteristics of a piece of music, such as the artist or genre, and classify it accordingly. The method includes transforming digital time-domain representation of music into a frequency-domain representation, then dividing that frequency data into time slices, and compressing it into frequency bands to form multiple learning representations of each song. The learning representations that result are processed by a group of Support Vector Machines, then by a neural network, both previously trained to distinguish among a given set of characteristics, to determine the classification.
68 Citations
48 Claims
-
1. A method of extracting classifying data from an audio signal, the method comprising the steps of:
-
(a) processing said audio signal into a perceptual representation of its constituent frequencies;
(b) processing said perceptual representation into at least one learning representation of said audio data stream;
(c) inputting at least one said learning representation into a multi-stage classifier, whereby said multi-stage classifier extracts classifying data from said learning representations and outputs the classification of said audio signal. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17)
-
-
10. The method of extracting classifying data from an audio signal according to claim 10, wherein said first stage of said multi-stage classifier comprises at least one Support Vector Machine per category of classification.
-
18. A computer readable storage medium, storing therein a program of instructions for causing a computer to execute process of extracting classifying data from an audio signal, said process comprising the steps of:
-
(a) processing said audio signal into a perceptual representation of its constituent frequencies;
(b) processing said perceptual representation into at least one learning representation;
(c) inputting said learning representations of said audio data stream into a multi-stage classifier, whereby said multi-stage classifier extracts classifying data from said learning representations and outputs the classification of said audio signal.
-
-
19. A method of representing an audio signal for machine learning comprising:
-
(a) creating a perceptual representation of said audio signal by performing a frequency domain transform on at least one time-sampled window of a digital representation of said audio signal, said perceptual representation comprising component magnitudes of constituent frequency vectors that comprise said audio signal;
(b) calculating a magnitude of each constituent frequency vector within said audio signal;
(c) grouping each of said constituent frequency vectors into a number of frequency bands;
(d) calculating an average magnitude of said constituent frequency vectors within each of said frequency bands; and
(e) arranging said magnitudes into a learning representation. - View Dependent Claims (20, 21, 22, 23, 24, 25)
-
-
26. A computer readable storage medium, storing therein a program of instructions for causing a computer to execute process of representing an audio signal for machine learning, said process comprising the steps of:
-
(a) creating a perceptual representation of said audio signal by performing a frequency domain transform on at least one time-sampled window of a digital representation of said audio signal, said perceptual representation comprising component magnitudes of constituent frequency vectors that comprise said audio signal;
(b) calculating a magnitude of each constituent frequency vector within said audio signal;
(c) grouping each of said constituent frequency vectors into a number of frequency bands;
(d) calculating an average magnitude of said constituent frequency vectors within each of said frequency bands; and
(e) arranging said magnitudes into a learning representation.
-
-
27. An apparatus for classifying an audio data stream comprising:
-
(a) a means for covering an audio data stream into a perceptual representation of its constituent frequencies;
(b) a means for dividing said perceptual representation into learning representations; and
(c) a multi-stage classifying means trained to distinguish among classifying categories of said audio data stream, wherein said multi-stage classifying means outputs the classification of said audio signal. - View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 37, 38, 39, 40, 41, 42, 43)
-
-
36. The apparatus according to claim 36, wherein said first stage of said multi-stage classifier comprises at least one Support Vector Machine per category of classification.
-
44. An apparatus for representing an audio signal for machine learning comprising:
-
(a) a means for performing a frequency domain transform on at least one time-sampled window of a digital representation of said audio signal, said perceptual representation comprising component magnitudes of constituent frequency vectors that comprise said audio signal;
(b) a means for calculating a magnitude of each constituent frequency vector;
(c) a means for grouping each of said constituent frequency vectors into a number of frequency bands;
(d) a means for calculating an average magnitude of said constituent frequency vectors within each of said frequency bands; and
(e) a means for arranging said magnitudes into a learning representation. - View Dependent Claims (45, 46, 47, 48)
-
Specification