Method and system for the automatic segmentation of an audio stream into semantic or syntactic units
First Claim
1. A method for the segmentation of an audio stream into semantic or syntactic units wherein the audio stream is provided in a digitized format, comprising the steps of:
- determining a fundamental frequency for the digitized audio stream;
detecting changes of the fundamental frequency in the audio stream;
determining candidate boundaries for the semantic or syntactic units depending on the detected changes of the fundamental frequency;
extracting at least one prosodic feature in the neighborhood of the candidate boundaries;
determining boundaries for the semantic or syntactic units depending on the at least one prosodic feature.
2 Assignments
0 Petitions
Accused Products
Abstract
A digitized speech signal (600) is input to an F0 (fundamental frequency) processor that computes (610) a continuous F0 data from the speech signal. By the criterion voicing state transition (voiced/unvoiced transitions) the speech signal is presegmented (620) into segments. For each segment (630) it is evaluated (640) whether F0 is defined or not defined i.e. whether F0 is ON or OFF. In case of F0=OFF a candidate segment boundary is assumed as described above and, starting from that boundary, prosodic features are computed (650). The feature values are input into a classification tree and each candidate segment is classified thereby revealing, as a result, the existence or non-existence of a semantic or syntactic speech unit.
-
Citations
14 Claims
-
1. A method for the segmentation of an audio stream into semantic or syntactic units wherein the audio stream is provided in a digitized format, comprising the steps of:
-
determining a fundamental frequency for the digitized audio stream;
detecting changes of the fundamental frequency in the audio stream;
determining candidate boundaries for the semantic or syntactic units depending on the detected changes of the fundamental frequency;
extracting at least one prosodic feature in the neighborhood of the candidate boundaries;
determining boundaries for the semantic or syntactic units depending on the at least one prosodic feature. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. An article of manufacture comprising a computer usable medium having computer readable program code means embodied therein for causing segmentation of an audio stream into semantic or syntactic units, wherein the audio stream is provided in a digitized format, the computer readable program code means in the article of manufacture comprising computer readable program code means for causing a computer to effect:
-
determining a fundamental frequency for the digitized audio stream;
detecting changes of the fundamental frequency in the audio stream;
determining candidate boundaries for the semantic or syntactic units depending on the detected changes of the fundamental frequency;
extracting at least one prosodic feature in the neighborhood of the candidate boundaries;
determining boundaries for the semantic or syntactic units depending on the at least one prosodic feature.
-
-
12. A digital audio processing system for segmentation of a digitized audio stream into semantic or syntactic units comprising:
-
means for determining a fundamental frequency for the digitized audio stream, means for detecting changes of the fundamental frequency in the audio stream, means for determining candidate boundaries for the semantic or syntactic units depending on the detected changes of the fundamental frequency, means for extracting at least one prosodic feature in the neighborhood of the candidate boundaries, and means for determining boundaries for the semantic or syntactic units depending on the at least one prosodic feature. - View Dependent Claims (13, 14)
-
Specification