Neural network classifier for separating audio sources from a monophonic audio signal
First Claim
1. A method for separating audio sources from a monophonic audio signal, comprising:
- (a) providing a monophonic audio signal comprising a down-mix of a plurality of unknown audio sources;
(b) separating the audio signal into a sequence of baseline frames;
(c) windowing each frame;
(d) extracting a plurality of audio features from each baseline frame that tend to distinguish the audio sources; and
(e) applying the audio features to a neural network (NN) classifier trained on a representative set of audio sources with said audio features, said neural network classifier outputting at least one measure of an audio source included in each said baseline frame of the monophonic audio signal.
2 Assignments
0 Petitions
Accused Products
Abstract
A neural network classifier provides the ability to separate and categorize multiple arbitrary and previously unknown audio sources down-mixed to a single monophonic audio signal. This is accomplished by breaking the monophonic audio signal into baseline frames (possibly overlapping), windowing the frames, extracting a number of descriptive features in each frame, and employing a pre-trained nonlinear neural network as a classifier. Each neural network output manifests the presence of a pre-determined type of audio source in each baseline frame of the monophonic audio signal. The neural network classifier is well suited to address widely changing parameters of the signal and sources, time and frequency domain overlapping of the sources, and reverberation and occlusions in real-life signals. The classifier outputs can be used as a front-end to create multiple audio channels for a source separation algorithm (e.g., ICA) or as parameters in a post-processing algorithm (e.g. categorize music, track sources, generate audio indexes for the purposes of navigation, re-mixing, security and surveillance, telephone and wireless communications, and teleconferencing).
105 Citations
27 Claims
-
1. A method for separating audio sources from a monophonic audio signal, comprising:
-
(a) providing a monophonic audio signal comprising a down-mix of a plurality of unknown audio sources;
(b) separating the audio signal into a sequence of baseline frames;
(c) windowing each frame;
(d) extracting a plurality of audio features from each baseline frame that tend to distinguish the audio sources; and
(e) applying the audio features to a neural network (NN) classifier trained on a representative set of audio sources with said audio features, said neural network classifier outputting at least one measure of an audio source included in each said baseline frame of the monophonic audio signal. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
-
24. A method for separating audio sources from a monophonic audio signal, comprising:
-
(a) providing a monophonic audio signal comprising a down-mix of a plurality of unknown audio sources;
(b) separating the audio signal into a sequence of baseline frames;
(c) windowing each frame;
(d) extracting a plurality of audio features from each baseline frame that tend to distinguish the audio sources;
(e) repeating steps (b) through (d) with a different frame size to extract features at multiple resolutions;
(f) scaling the extracted audio features at the different resolutions to the baseline frame; and
(g) applying the audio features to a neural network (NN) classifier trained on a representative set of audio sources with said audio features, said neural network classifier having a plurality of output neurons that each signal the presence of a certain audio source in the monophonic audio signal for each baseline frame.
-
-
25. An audio source classifier, comprising:
-
A framer for separating a monophonic audio signal comprising a down-mix of a plurality of unknown audio sources into a sequence of windowed baseline frames;
A feature extractor for extracting a plurality of audio features from each baseline frame that tend to distinguish the audio sources; and
A neural network (NN) classifier trained on a representative set of audio sources with said audio features, said neural network classifier receiving the extracted audio features and outputting at least one measure of an audio source included in each said baseline frame of the monophonic audio signal. - View Dependent Claims (26, 27)
-
Specification