Voice labeling error detecting system, voice labeling error detecting method and program
First Claim
1. A voice labeling error detecting system comprising:
- data acquisition means for acquiring waveform data representing a waveform of a unit voice and labeling data for identifying a kind of said unit voice;
classification means for classifying the waveform data acquired by said data acquisition means into the kinds of unit voice, based on the labeling data acquired by said data acquisition means;
evaluation value decision means for specifying a frequency of a formant of each unit voice represented by the waveform data acquired by said data acquisition means and determining an evaluation value of said waveform data based on the specified frequency; and
error detection means for detecting the waveform data from among a set of waveform data classified into a same kind, for which a deviation of evaluation value within said set reaches a predetermined amount, and outputting the data representing said detected waveform data, as waveform data having a labeling error, andwherein said evaluation value H is calculated by the following formula representing a linear combination of values {|f(k)−
F(k)|};
wherein F(k) is a frequency of the k-th formant of a unit voice indicated by the waveform data to calculate the evaluation value, and f(k) is an average value of the frequency of the k-th formant of the unit voice indicated by each waveform data classified into the same kind as said waveform data, W(k) is a weighting factor and n is the order of formant of the phoneme having the highest frequency.
5 Assignments
0 Petitions
Accused Products
Abstract
A labeling part 3 analyzes the character string data to produce a phoneme label and a prosody label, partition the voice data stored in a voice database 1 into phonemic data, and label the phonemic data, employing the phoneme label and the like. A phoneme segmenting part 4 connects the voice data labeled with the same kind of phonemic data, and a formant extracting part 5 specifies the frequency of formant of each piece of phonemic data. A processing part 6 decides an evaluation value for each phonemic data based on the frequency of formant, and an error detection part 7 detects the phonemic data of which a deviation of the evaluation value within a set of phonemic data reaches a predetermined amount.
-
Citations
7 Claims
-
1. A voice labeling error detecting system comprising:
-
data acquisition means for acquiring waveform data representing a waveform of a unit voice and labeling data for identifying a kind of said unit voice; classification means for classifying the waveform data acquired by said data acquisition means into the kinds of unit voice, based on the labeling data acquired by said data acquisition means; evaluation value decision means for specifying a frequency of a formant of each unit voice represented by the waveform data acquired by said data acquisition means and determining an evaluation value of said waveform data based on the specified frequency; and error detection means for detecting the waveform data from among a set of waveform data classified into a same kind, for which a deviation of evaluation value within said set reaches a predetermined amount, and outputting the data representing said detected waveform data, as waveform data having a labeling error, and wherein said evaluation value H is calculated by the following formula representing a linear combination of values {|f(k)−
F(k)|};wherein F(k) is a frequency of the k-th formant of a unit voice indicated by the waveform data to calculate the evaluation value, and f(k) is an average value of the frequency of the k-th formant of the unit voice indicated by each waveform data classified into the same kind as said waveform data, W(k) is a weighting factor and n is the order of formant of the phoneme having the highest frequency. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A voice labeling error detecting method comprising the steps of:
-
acquiring waveform data representing a waveform of a unit voice and labeling data for identifying a kind of said unit voice; classifying said acquired waveform data into the kinds of unit voice, based on said acquired labeling data; specifying a frequency of a formant of each unit voice represented by the waveform data and deciding an evaluation value of said waveform data based on the specified frequency; and detecting the waveform data having a labeling error, from among a set of waveform data classified into a same kind, in which a deviation of evaluation value within said set reaches a predetermined amount and outputting data representing said detected waveform data, wherein said evaluation value H is calculated by the following formula representing a linear combination of values {|f(k)−
F(k)|};wherein F(k) is a frequency of the k-th formant of a unit voice indicated by the waveform data to calculate the evaluation value, and f(k) is an average value of the frequency of the k-th formant of the unit voice indicated by each waveform data classified into the same kind as said waveform data, W(k) is a weighting factor and n is the order of formant of the phoneme having the highest frequency.
-
Specification