Method and apparatus for automatically recognizing audio data
First Claim
1. A method of identifying, among a plurality of music audio files in digital format generated by machine, a first one of the music audio files, the method employing a segment of audio data which is derived from the first music audio file and comprising the steps of:
- (a) inputting the segment of audio data generated by the machine into three different extraction processes, the three different extraction processes including (1) an IMFCC1 (first improved mel frequency cepstrum coefficients) extraction process, the IMFCC1 extraction process performing a conventional MFCC (mel frequency cepstrum coefficients) algorithm but not performing a logarithmic step of the conventional MFCC algorithm, wherein IMFCC1 audio features are output, (2) an IMFCC2 (second improved mel frequency cepstrum coefficients) extraction process, the IMFCC2 extraction process performing the conventional MFCC algorithm but not performing both the logarithmic step and a discrete cosine transform step of the conventional MFCC algorithm and instead performing an ICA (independent component analysis) process, wherein IMFCC2 audio features are output, and (3) an ICA1 (improved independent component analysis) extraction process performing a conventional ICA (independent component analysis) process but subjecting the segment of audio data to pre-emphasis preprocessing and windowing preprocessing, wherein ICA1 features are output;
(b) creating an observation vector containing the IMFCC1 audio features, the IMFCC2 audio features and the ICA1 audio features; and
(c) recognizing the machine generated first music audio file using the observation vector and a database trained using only observation vectors containing IMFCC1, IMFCC2 and ICA1 audio features for each respective target music audio file;
wherein the audio features comprise features obtained by analyzing the audio data, or a transformed version of the audio data, to derive a transform based on its audio features, and applying the transform to the audio data, or the transformed version of the audio data respectively, to obtain amplitudes of the audio features.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus are proposed for automatically recognizing observed audio data. An observation vector is created of audio features extracted from the observed audio data and the observed audio data is recognized from the observation vector. The audio features include features are selected from a group of 3 types of features obtained from the observed audio data: (i) ICA features obtained by processing the observed audio data, (ii) first MFCC features obtained by removing a logarithm step from the conventional MFCC process, or (iii) second MFCC features obtained by applying the ICA process to results of a mel scale filter bank.
32 Citations
18 Claims
-
1. A method of identifying, among a plurality of music audio files in digital format generated by machine, a first one of the music audio files, the method employing a segment of audio data which is derived from the first music audio file and comprising the steps of:
-
(a) inputting the segment of audio data generated by the machine into three different extraction processes, the three different extraction processes including (1) an IMFCC1 (first improved mel frequency cepstrum coefficients) extraction process, the IMFCC1 extraction process performing a conventional MFCC (mel frequency cepstrum coefficients) algorithm but not performing a logarithmic step of the conventional MFCC algorithm, wherein IMFCC1 audio features are output, (2) an IMFCC2 (second improved mel frequency cepstrum coefficients) extraction process, the IMFCC2 extraction process performing the conventional MFCC algorithm but not performing both the logarithmic step and a discrete cosine transform step of the conventional MFCC algorithm and instead performing an ICA (independent component analysis) process, wherein IMFCC2 audio features are output, and (3) an ICA1 (improved independent component analysis) extraction process performing a conventional ICA (independent component analysis) process but subjecting the segment of audio data to pre-emphasis preprocessing and windowing preprocessing, wherein ICA1 features are output; (b) creating an observation vector containing the IMFCC1 audio features, the IMFCC2 audio features and the ICA1 audio features; and (c) recognizing the machine generated first music audio file using the observation vector and a database trained using only observation vectors containing IMFCC1, IMFCC2 and ICA1 audio features for each respective target music audio file;
wherein the audio features comprise features obtained by analyzing the audio data, or a transformed version of the audio data, to derive a transform based on its audio features, and applying the transform to the audio data, or the transformed version of the audio data respectively, to obtain amplitudes of the audio features. - View Dependent Claims (2, 3, 4)
-
-
5. A method of identifying, among a plurality of music audio files in digital format generated by machine, a first one of the music audio files, the method employing a segment of audio data which is derived from the first music audio file and comprising the steps of:
-
(a) inputting the segment of audio data generated by the machine into three different extraction processes, the three different extraction processes including (1) an IMFCC1 (first improved mel frequency cepstrum coefficients) extraction process, the IMFCC1 extraction process performing a conventional MFCC (mel frequency cepstrum coefficients) algorithm but not performing a logarithmic step of the conventional MFCC, wherein IMFCC1 audio features are output, (2) an IMFCC2 (second improved mel frequency cepstrum coefficients) extraction process, the IMFCC2 extraction process performing the conventional MFCC algorithm but not performing both the logarithmic step and a discrete cosine transform step of the conventional MFCC algorithm and instead performing an ICA (independent component analysis) process, wherein IMFCC2 audio features are output, and (3) an ICA1 (improved independent component analysis) extraction process performing a conventional ICA (independent component analysis) process but subjecting the segment of audio data to re-emphasis preprocessing and windowing preprocessing, wherein ICA1 audio features output; (b) creating an observation containing the IMFCC1 audio features, the IMFCC2 audio features and ICA1 audio features; and (c) recognizing the machine generated first music audio file using the observation vector and a database trained using only observation vectors containing IMFCC1, IMFCC2 and ICA1 audio features for each respective target music audio file. - View Dependent Claims (6, 7)
-
-
8. A method of identifying, among a plurality of music audio files in digital format generated by machine, a first one of the music audio files, the method employing a segment of audio data which is derived from the first music audio file and comprising the steps of:
-
(a) inputting the segment of audio data generated by the machine into three different extraction processes, the three different extraction processes including (1) an IMFCC1 (first improved mel frequency cepstrum coefficients) extraction process, the IMFCC1 extraction process performing a conventional MFCC (mel frequency cepstrum coefficients) algorithm but not performing a logarithmic step of the conventional MFCC algorithm, wherein IMFCC1 audio features are output, (2) an IMFCC2 (second improved mel frequency cepstrum coefficients) extraction process, the IMFCC2 extraction process performing the conventional MFCC algorithm but not performing both the logarithmic step and a discrete cosine transform step of the conventional MFCC algorithm and instead performing an ICA (independent component analysis) process, wherein IMFCC2 audio features are output, and (3) an ICA1 (improved independent component analysis) extraction process performing a conventional ICA (independent component analysis) process but subjecting the segment of audio data to pre-emphasis preprocessing and windowing preprocessing, wherein ICA1 audio features are output; (b) creating an observation vector containing the IMFCC1, IMFCC2 and ICA1 audio features; (c) recognizing the machine generated first music audio file using the observation vector;
wherein step (c) is performed by determining, within a database containing HMM models trained using only observation vectors containing IMFCC1, IMFCC2 and ICA1 audio features for each respective target music audio file, the HMM model for which probability of the observation vector being obtained given the target music audio file is maximal. - View Dependent Claims (9)
-
-
10. An apparatus for identifying, among a plurality of music audio files in digital format generated by machine, a first one of the music audio files, based on a segment of audio data which is derived from the first music audio file, the apparatus comprising:
-
(a) input unit inputting the segment of audio data generated by the machine into three different extraction processes, the three different extraction processes including (1) an IMFCC1 (first improved mel frequency cepstrum coefficients) extraction process, the IMFCC1 extraction process performing a conventional MFCC (mel frequency cepstrum coefficients) algorithm but not performing a logarithmic step of the conventional MFCC algorithm, wherein IMFCC 1 audio features are output, (2) an IMFCC2 (second improved mel frequency cepstrum coefficients) extraction process, the IMFCC2 extraction process performing the conventional MFCC algorithm but not performing both the logarithmic step and a discrete cosine transform step of the conventional MFCC algorithm and instead performing an ICA (independent component analysis) process, wherein IMFCC2 audio features are output, and (3) an ICA1 (improved independent component analysis) extraction process performing a conventional ICA (independent component analysis) process but subjecting the segment of audio data to pre-emphasis preprocessing and windowing preprocessing, wherein ICA1 audio features are output; (b) creation unit creating an observation vector containing the IMFCC1, IMFCC2 and ICA1 audio features output by the three different extraction processes respectively; and (c) recognition unit recognizing the machine generated first music audio file using the observation vector and a database trained using only observation vectors containing IMFCC1, IMFCC2 and ICA1 audio features for each respective target music audio file; wherein the audio features comprise features obtained by analyzing the audio data, or a transformed version of the audio data, to derive a transform based on its audio features, and applying the transform to the audio data, or the transformed version of the audio data respectively, to obtain amplitudes of the audio features. - View Dependent Claims (11, 12)
-
-
13. An apparatus for identifying, among a plurality of music audio files in digital format generated by machine, a first one of the music audio files, based on a segment of audio data which is derived from the first music audio file, the apparatus comprising:
-
(a) input unit inputting the segment of audio data generated by the machine into three different extraction processes, the three different extraction processes including (1) an IMFCC1 (first improved mel frequency cepstrum coefficients) extraction process, the IMFCC1 extraction process performing a conventional MFCC (mel frequency cepstrum coefficients) algorithm but not performing a logarithmic step of the conventional MFCC algorithm, wherein IMFCC1 audio features are output, (2) an IMFCC2 (second improved mel frequency cepstrum coefficients) extraction process, the IMFCC2 extraction process performing the conventional MFCC algorithm but not performing both the logarithmic step and a discrete cosine transform step of the conventional MFCC algorithm and instead performing an ICA (independent component analysis) wherein IMFCC2 audio features are output, and (3) an ICA1 (improved independent component analysis) extraction process performing a conventional ICA (independent component analysis) process but subjectin the segment of audio data to pre-emphasis preprocessing and windowing preprocessing wherein ICA1 audio features are output; (b) creation unit creating an observation vector containing IMFCC1, IMFCC2 and ICA1 audio features output by the three different extraction processes respectively; and (c) recognition unit recognizing the machine generated first music audio file using the observation vector and a database trained using only observation vectors containing IMFCC1, IMFCC2 and ICA1 audio features for each respective target music audio file. - View Dependent Claims (14)
-
-
15. An apparatus for identifying, among a plurality of music audio files in digital format generated by machine, a first one of the music audio files, based on a segment of audio data which is derived from the first music audio file, the apparatus comprising:
-
(a) input unit inputting the segment of audio data generated by the machine into three different extraction processes, the different extraction processes including (1) an IMFCC1 (first improved mel frequency cepstrum coefficients) extraction process, the IMFCC1 extraction process performing a conventional MFCC mel frequency cepstrum coefficients) algorithm but not performing a logarithmic step of the conventional MFCC algorithm, wherein IMFCC1 audio features are output, (2) an IMFCC2 (second improved mel frequency cepstrum coefficients) extraction process, the IMFCC2 extraction process performing the conventional MFCC algorithm but not performing both the logarithmic step and a discrete cosine transform step of the conventional MFCC algorithm and instead performing an ICA (independent component analysis) process, wherein IMFCC2 audio features are output, and (3) an ICA1 (improved independent component analysis) extraction process performing a conventional ICA (independent component analysis) process but subjecting the segment of audio data to pre-emphasis preprocessing and windowing preprocessing, wherein ICA1 audio features are output; (b) creation unit creating an observation vector containing the IMFCC1, IMFCC2 and ICA1 audio features output by the three different extraction processes respectively; (c) a database containing HMM models trained using only observation vectors containing IMFCC1, IMFCC2 and ICA1 audio features for each respective target machine generated music audio file, and (d) determination unit determining the HMM model in the database for which probability of the observation vector being obtained given the target music audio file is maximal. - View Dependent Claims (16)
-
-
17. A method of identifying, among a plurality of audio files in digital format generated by machine, a first one of the audio files, the method employing a segment of audio data which is derived from the first audio file and comprising the steps of:
-
(a) inputting the segment of audio data generated by the machine into different extraction processes, at least one of the different extraction processes including an IMFCC (improved mel frequency cepstrum coefficients) extraction process, the IMFCC extraction process performing a conventional MFCC (mel frequency cepstrum coefficients) algorithm but not performing both a logarithmic step of the conventional MFCC algorithm, and a discrete cosine transform step of the conventional MFCC algorithm and instead performing an ICA (independent component analysis) process, wherein IMFCC audio features are output; (b) creating an observation vector containing at least the IMFCC audio features extracted from the segment of audio data; and (c) recognizing the machine generated first audio file using the observation vector;
wherein the audio features comprise features obtained by analyzing the audio data, or a transformed version of the audio data, to derive a transform based on its audio features, and applying the transform to the audio data, or the transformed version of the audio data respectively, to obtain amplitudes of the audio features.
-
-
18. An apparatus for identifying, among a plurality of audio files in digital format generated by machine, a first one of the audio files, based on a segment of audio data which is derived from the first audio file, the apparatus comprising:
-
(a) input unit inputting the segment of audio data generated by the machine into different extraction processes, at least one of different extraction processes including an IMFCC (improved mel frequency cepstrum coefficients) extraction process, the IMFCC extraction process performing a conventional MFCC (mel frequency cepstrum coefficients) algorithm but not performing both a logarithmic step of the conventional MFCC algorithm, and a discrete cosine transform step of the conventional MFCC algorithm and instead performing an ICA (independent component analysis) process, wherein IMFCC audio features are output; (b) creation unit creating an observation vector containing at least the IMFCC audio features extracted from the segment of audio data; and (c) recognition unit recognizing the machine generated first audio file using the observation vector; wherein the audio features comprise features obtained by analyzing the audio data, or a transformed version of the audio data, to derive a transform based on its audio features, and applying the transform to the audio data, or the transformed version of the audio data respectively, to obtain amplitudes of the audio features.
-
Specification