Method of segmenting an audio stream
First Claim
1. At method of segmentation of an audio stream, comprising;
- receiving the audio stream;
calculating a first-grade characteristic of the audio stream;
calculating a second-grade characteristic of the audio stream; and
performing a decision-making analysis, wherein the segmentation includes a division of the audio stream into segments containing different homogeneous signals based on the first-grade characteristic and the second-grade characteristic of the audio stream;
wherein calculating of the first-grade characteristic is performed by a division of the audio stream into frames for which of them an audio feature vector is calculated;
wherein the audio feature vector includes five formant frequencies, first and second reflection coefficients, an energy of a prediction error coefficient, and a pre-emphasized energy ratio coefficient;
wherein calculating the second-grade characteristic is performed in a sequence of a predefined and not overlapped windows, each of the windows includes a definite number of said frames with said audio feature vectors calculated during the calculating of the first-grade characteristic;
wherein calculating the second-grade characteristic includes calculating a statistical feature vector for each said window;
wherein the statistical feature vector includes two sub-vectors, a first one of said two sub-vectors includes mean values of the formant frequencies and dispersions of the formant frequencies, and a second one of said two sub-vectors includesa difference between maximal and minimal values of the second reflection coefficient multiplied by the mean value of the second reflection coefficient,a product of the mean value and the dispersion of the energy of the prediction error coefficient,a sum of modules of differences between said energies of the prediction error coefficients for said neighboring frames divided by the sum of the modules of differences between said energies of the prediction error coefficients,a difference between maximal and minimal values of said pre-emphasized energy ratio coefficients, anda number of said frames in the window in which the first reflection coefficients outnumber a predefined threshold value.
1 Assignment
0 Petitions
Accused Products
Abstract
Disclosed herein is a segmentation method, which divides an input audio stream into segments containing different homogeneous signals. The main objective of this method is localization of segments with stationary properties. This method seeks all no-stationary points or intervals in the audio stream and creates a list of segments. The obtained list of segments can be used as an input data for the following procedures, such as classification, speech/music/noise attribution and so on. The proposed segmentation method is based on the analysis of audio signal statistical features variation and comprises three main stages: stage of first-grade characteristics calculation, stage of second-grade characteristics calculation and stage of decision-making.
28 Citations
2 Claims
-
1. At method of segmentation of an audio stream, comprising;
-
receiving the audio stream;
calculating a first-grade characteristic of the audio stream;calculating a second-grade characteristic of the audio stream; and performing a decision-making analysis, wherein the segmentation includes a division of the audio stream into segments containing different homogeneous signals based on the first-grade characteristic and the second-grade characteristic of the audio stream; wherein calculating of the first-grade characteristic is performed by a division of the audio stream into frames for which of them an audio feature vector is calculated; wherein the audio feature vector includes five formant frequencies, first and second reflection coefficients, an energy of a prediction error coefficient, and a pre-emphasized energy ratio coefficient; wherein calculating the second-grade characteristic is performed in a sequence of a predefined and not overlapped windows, each of the windows includes a definite number of said frames with said audio feature vectors calculated during the calculating of the first-grade characteristic; wherein calculating the second-grade characteristic includes calculating a statistical feature vector for each said window; wherein the statistical feature vector includes two sub-vectors, a first one of said two sub-vectors includes mean values of the formant frequencies and dispersions of the formant frequencies, and a second one of said two sub-vectors includes a difference between maximal and minimal values of the second reflection coefficient multiplied by the mean value of the second reflection coefficient, a product of the mean value and the dispersion of the energy of the prediction error coefficient, a sum of modules of differences between said energies of the prediction error coefficients for said neighboring frames divided by the sum of the modules of differences between said energies of the prediction error coefficients, a difference between maximal and minimal values of said pre-emphasized energy ratio coefficients, and a number of said frames in the window in which the first reflection coefficients outnumber a predefined threshold value. - View Dependent Claims (2)
-
Specification