Automatic speech segmentation and verification using segment confidence measures
First Claim
1. An automatic speech segmentation and verification method for segmenting into speech unit segments and verifying said speech unit segments by determining which phonetic units defined by a known text script are to be accepted for output, said phonetic units accepted for output being used for speech synthesis, comprising:
- a retrieving step, for retrieving the recorded speech corpus, the recorded speech corpus corresponding to the known text script, the known text script defining phonetic information with N said phonetic units;
a segmenting step, for segmenting the recorded speech corpus into N test speech unit segments referring to the phonetic information of the N phonetic units in the known text script;
a segment-confidence-measure verifying step, for verifying segment confidence measures of all N cutting points of the N test speech unit segments to determine if the cutting points of the N test speech unit segments are correct;
a phonetic-confidence-measure verifying step, for verifying phonetic confidence measures of the test speech unit segments to determine if the test speech unit segments correspond to the known text script; and
a determining step, for determining acceptance of the phonetic unit by comparing a combination of the segment confidence measures and the phonetic confidence measures of the test speech unit segments to a predetermined threshold value;
wherein if the combined confidence measure is greater than the predetermined threshold value, the phonetic unit is accepted for output.
1 Assignment
0 Petitions
Accused Products
Abstract
An automatic speech segmentation and verification system and method is disclosed, which has a known text script and a recorded speech corpus corresponding to the known text script. A speech unit segmentor segments the recorded speech corpus into N test speech unit segments referring to the phonetic information of the known text script. Then, a segmental verifier is applied to obtain a confidence measure of syllable segmentation for verifying the correctness of the cutting points of test speech unit segments. A phonetic verifier obtains a confidence measure of syllable verification by using verification models for verifying whether the recorded speech corpus is correctly recorded. Finally, a speech unit inspector integrates the confidence measure of syllable segmentation and the confidence measure of syllable verification to determine whether the test speech unit segment is accepted or not.
30 Citations
18 Claims
-
1. An automatic speech segmentation and verification method for segmenting into speech unit segments and verifying said speech unit segments by determining which phonetic units defined by a known text script are to be accepted for output, said phonetic units accepted for output being used for speech synthesis, comprising:
-
a retrieving step, for retrieving the recorded speech corpus, the recorded speech corpus corresponding to the known text script, the known text script defining phonetic information with N said phonetic units; a segmenting step, for segmenting the recorded speech corpus into N test speech unit segments referring to the phonetic information of the N phonetic units in the known text script; a segment-confidence-measure verifying step, for verifying segment confidence measures of all N cutting points of the N test speech unit segments to determine if the cutting points of the N test speech unit segments are correct; a phonetic-confidence-measure verifying step, for verifying phonetic confidence measures of the test speech unit segments to determine if the test speech unit segments correspond to the known text script; and a determining step, for determining acceptance of the phonetic unit by comparing a combination of the segment confidence measures and the phonetic confidence measures of the test speech unit segments to a predetermined threshold value;
wherein if the combined confidence measure is greater than the predetermined threshold value, the phonetic unit is accepted for output. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. An automatic speech segmentation and verification system comprising:
-
a database for storing a known text script and a recorded speech corpus corresponding to the known text script, and the known text script has phonetic information with N speech unit segment wherein N is a positive integer; a speech unit segmentor for segmenting the recorded speech corpus into N test speech unit segments referring to the phonetic information of the known text script; a segmental verifier for verifying segment confidence measures of all cutting points of the N test speech unit segments to determine whether the cutting points of the N test speech unit segments are correct; a phonetic verifier for obtaining a confidence measure of segment verification by using verification models for verifying whether the recorded speech corpus is correctly recorded; and a speech unit inspector for integrating the confidence measure of speech unit segmentation and the confidence measure of segment verification to determine whether the test speech unit segment is accepted. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. An automatic speech segmentation and verification system comprising:
-
a database for storing a known text script and a recorded speech corpus corresponding to the known text script, and the known text script has phonetic information with N speech unit segment wherein N is a positive integer; a speech unit segmentor for segmenting the recorded speech corpus into N test speech unit segments referring to the phonetic information of the known text script; a segmental verifier for verifying segment confidence measures of all cutting points of the N test speech unit segments to determine whether the cutting points of the N test speech unit segments are correct; a phonetic verifier for obtaining a confidence measure of segment verification by using verification models for verifying whether the recorded speech corpus is correctly recorded; and a speech unit inspector for integrating the confidence measure of speech unit segmentation and the confidence measure of segment verification to determine whether the test speech unit segment is accepted, wherein each segment confidence measure of the test speech unit segment is determined by; where D is the vector of multiple expert decisions of the cutting point, di is the cutting point, d =p(D) is a final decision of the cutting point, K(x) is a monotonically increasing function that maps a non-negative variable x into a value between 0 and 1, g(c(s), f(s)) is a cost function value between a cost function ranging from 0 to 1, s is a segment, c(s) is the type category of the segment s and, f(s) is the acoustic feature of the segment.
-
-
18. An automatic speech segmentation and verification system comprising:
-
a database for storing a known text script and a recorded speech corpus corresponding to the known text script, and the known text script has phonetic information with N speech unit segment wherein N is a positive integer; a speech unit segmentor for segmenting the recorded speech corpus into N test speech unit segments referring to the phonetic information of the known text script; a segmental verifier for verifying segment confidence measures of all cutting points of the N test speech unit segments to determine whether the cutting points of the N test speech unit segments are correct; a phonetic verifier for obtaining a confidence measure of segment verification by using verification models for verifying whether the recorded speech corpus is correctly recorded; and a speech unit inspector for integrating the confidence measure of speech unit segmentation and the confidence measure of segment verification to determine whether the test speech unit segment is accepted, wherein each phonetic confidence measure of the test speech unit segments is determined by;
CMV=min{LLRI, LLRF, 0},XI is initial segment of the test speech unit segment, XF is final segment of the test speech unit segment, H0 is a null hypothesis of the test speech unit segment recorded correctly, H1 is an alternative hypothesis of the test speech unit segment recorded incorrectly, and LLR is a log likelihood ratio.
-
Specification