Identification of the presence of speech in digital audio data
First Claim
1. Method for determining speech related audio data within a record of digital audio data, the method comprising steps for extracting audio features from the record of digital audio data, classifying the record of digital audio data based on the extracted audio features and with respect to one or more predetermined audio classes, and marking at least a part of the record of digital audio data classified as speech, characterised in that the extraction of at least one audio feature comprises the following steps:
- partitioning the record of digital audio data into adjoining frames, for each frame defining a window being formed by a sequence of adjoining frames containing the frame under consideration, determining for the frame under consideration and at least one further frame of the window a spectral-emphasis-value which is related to the frequency distribution contained in the digital audio data of the respective frame, and assigning a presence-of-speech indicator value to the frame under consideration based on an evaluation of the differences between the spectral-emphasis-values determined for the frame under consideration and the at least one further frame of the window.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention provides a method, a computer-software-product and an apparatus for enabling a determination of speech related audio data within a record of digital audio data. The method comprises steps for extracting audio features from the record of digital audio data, for classifying one or more subsections of the record of digital audio data, and for marking at least a part of the record of digital audio data classified as speech. The classification of the digital audio data record is performed on the basis of the extracted audio features and with respect to at least one predetermined audio class. The extraction of the at least one audio feature as used by a method according to the invention comprises steps for partitioning the record of digital audio data into adjoining frames, defining a window for each frame which is formed by a sequence of adjoining frames containing the frame under consideration, determining for the frame under consideration and at least one further frame of the window a spectral-emphasis-value which is related to the frequency distribution contained in the digital audio data of the respective frame, and assigning a presence-of-speech indicator value to the frame under consideration based on an evaluation of the differences between the spectral-emphasis-values determined for the frame under consideration and at least one further frame of the window.
49 Citations
9 Claims
-
1. Method for determining speech related audio data within a record of digital audio data, the method comprising steps for
extracting audio features from the record of digital audio data, classifying the record of digital audio data based on the extracted audio features and with respect to one or more predetermined audio classes, and marking at least a part of the record of digital audio data classified as speech, characterised in that the extraction of at least one audio feature comprises the following steps: -
partitioning the record of digital audio data into adjoining frames, for each frame defining a window being formed by a sequence of adjoining frames containing the frame under consideration, determining for the frame under consideration and at least one further frame of the window a spectral-emphasis-value which is related to the frequency distribution contained in the digital audio data of the respective frame, and assigning a presence-of-speech indicator value to the frame under consideration based on an evaluation of the differences between the spectral-emphasis-values determined for the frame under consideration and the at least one further frame of the window. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
Specification