Automatic speech segmentation and verification using segment confidence measures

US 7,472,066 B2
Filed: 02/23/2004
Issued: 12/30/2008
Est. Priority Date: 09/12/2003
Status: Active Grant

First Claim

Patent Images

1. An automatic speech segmentation and verification method for segmenting into speech unit segments and verifying said speech unit segments by determining which phonetic units defined by a known text script are to be accepted for output, said phonetic units accepted for output being used for speech synthesis, comprising:

a retrieving step, for retrieving the recorded speech corpus, the recorded speech corpus corresponding to the known text script, the known text script defining phonetic information with N said phonetic units;

a segmenting step, for segmenting the recorded speech corpus into N test speech unit segments referring to the phonetic information of the N phonetic units in the known text script;

a segment-confidence-measure verifying step, for verifying segment confidence measures of all N cutting points of the N test speech unit segments to determine if the cutting points of the N test speech unit segments are correct;

a phonetic-confidence-measure verifying step, for verifying phonetic confidence measures of the test speech unit segments to determine if the test speech unit segments correspond to the known text script; and

a determining step, for determining acceptance of the phonetic unit by comparing a combination of the segment confidence measures and the phonetic confidence measures of the test speech unit segments to a predetermined threshold value;

wherein if the combined confidence measure is greater than the predetermined threshold value, the phonetic unit is accepted for output.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An automatic speech segmentation and verification system and method is disclosed, which has a known text script and a recorded speech corpus corresponding to the known text script. A speech unit segmentor segments the recorded speech corpus into N test speech unit segments referring to the phonetic information of the known text script. Then, a segmental verifier is applied to obtain a confidence measure of syllable segmentation for verifying the correctness of the cutting points of test speech unit segments. A phonetic verifier obtains a confidence measure of syllable verification by using verification models for verifying whether the recorded speech corpus is correctly recorded. Finally, a speech unit inspector integrates the confidence measure of syllable segmentation and the confidence measure of syllable verification to determine whether the test speech unit segment is accepted or not.

30 Citations

View as Search Results

18 Claims

1. An automatic speech segmentation and verification method for segmenting into speech unit segments and verifying said speech unit segments by determining which phonetic units defined by a known text script are to be accepted for output, said phonetic units accepted for output being used for speech synthesis, comprising:
- a retrieving step, for retrieving the recorded speech corpus, the recorded speech corpus corresponding to the known text script, the known text script defining phonetic information with N said phonetic units;
  
  a segmenting step, for segmenting the recorded speech corpus into N test speech unit segments referring to the phonetic information of the N phonetic units in the known text script;
  
  a segment-confidence-measure verifying step, for verifying segment confidence measures of all N cutting points of the N test speech unit segments to determine if the cutting points of the N test speech unit segments are correct;
  
  a phonetic-confidence-measure verifying step, for verifying phonetic confidence measures of the test speech unit segments to determine if the test speech unit segments correspond to the known text script; and
  
  a determining step, for determining acceptance of the phonetic unit by comparing a combination of the segment confidence measures and the phonetic confidence measures of the test speech unit segments to a predetermined threshold value;
  
  wherein if the combined confidence measure is greater than the predetermined threshold value, the phonetic unit is accepted for output.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method as claimed in claim 1, wherein the segmenting step further comprises:
    - using a hidden Markov model (HMM) to cut the recorded speech corpus into N test speech unit segments referring to the phonetic information of the N phonetic units in the known text script, wherein each test speech unit segment is defined as correspondingly having an initial cutting point;
      
      performing a fine adjustment on the initial cutting point of the test speech unit segment according to at least one feature factor corresponding to each test speech unit segment and calculating at least one cutting point fine adjustment value corresponding to each test speech unit segment; and
      
      integrating the initial cutting point and the cutting point fine adjustment value of the test speech unit segment to obtain a cutting point of the test speech unit segment.
  - 3. The method as claimed in claim 2, wherein the feature factor of the test speech unit segment is a neighboring cutting point of the initial cutting point.
  - 4. The method as claimed in claim 2, wherein the feature factor of the test speech unit segment is a zero crossing rate (ZCR) of the test speech unit segment.
  - 5. The method as claimed in claim 2, wherein the feature factor of the test speech unit segment is an energy value of the test speech unit segment.
  - 6. The method as claimed in claim 5, wherein the energy value is an energy value of a band pass signal and a high pass signal retrieved from a speaker-dependent band.
  - 7. The method as claimed in claim 2, wherein each cutting point fine adjustment value has a weighted value, and the cutting point of the test speech unit segment is a weighted average of the initial cutting point and the cutting point fine adjustment value.
  - 8. The method as claimed in claim 1, wherein in the segment-confidence-measure step, each segment confidence measure of the test speech unit segment is:
    - $CMS = \max (1 - h (D) - \sum_{s, f}^{} g (c (s), f (s)), 0),$ where $h (D) = K (\sum_{i}^{} w_{i} \langle d_{i} - \overline{d} \rangle),$ D is a vector of multiple expert decisions of the cutting point, d_iis the cutting point, d=p(D) is a final decision of the cutting point, K(x) is a monotonically increasing function that maps a non-negative variable x into a value between 0 and 1, g(c(s), f(s)) is a cost function value between a cost function ranging from 0 to 1, s is a segment, c(s) is a type category of the segment s and, f(s) are acoustic features of the segment.
  - 9. The method as claimed in claim 1, wherein in the phonetic-confidence-measure step, each phonetic confidence measure of the test speech unit segments is:
    - CMV=min{LLR_I, LLR_F, 0}, $where {\begin{matrix} {LLR}_{I} = \log P (X_{I} | H_{0}) - \log P (X_{I} | H_{1}) \\ {LLR}_{F} = \log P (X_{F} | H_{0}) - \log P (X_{F} | H_{1}) \end{matrix},$ X_Iis an initial segment of the test speech unit segment, X_Fis a final segment of the test speech unit segment, H₀is a null hypothesis of the test speech unit segment recorded correctly, H₁is an alternative hypothesis of the test speech unit segment recorded incorrectly, and LLR is a log likelihood ratio.

10. An automatic speech segmentation and verification system comprising:
- a database for storing a known text script and a recorded speech corpus corresponding to the known text script, and the known text script has phonetic information with N speech unit segment wherein N is a positive integer;
  
  a speech unit segmentor for segmenting the recorded speech corpus into N test speech unit segments referring to the phonetic information of the known text script;
  
  a segmental verifier for verifying segment confidence measures of all cutting points of the N test speech unit segments to determine whether the cutting points of the N test speech unit segments are correct;
  
  a phonetic verifier for obtaining a confidence measure of segment verification by using verification models for verifying whether the recorded speech corpus is correctly recorded; and
  
  a speech unit inspector for integrating the confidence measure of speech unit segmentation and the confidence measure of segment verification to determine whether the test speech unit segment is accepted.
- View Dependent Claims (11, 12, 13, 14, 15, 16)
- - 11. The system as claimed in claim 10, wherein the segmental verifier performs the following steps:
    - using a hidden Markov model (HMM) to cut the recorded speech corpus into N test speech unit segments referring to the phonetic information of the N phonetic units in the known text script, wherein each test speech unit segment is defined as correspondingly having an initial cutting point;
      
      performing a fine adjustment on the initial cutting point of the test speech unit segment according to at least one feature factor corresponding to each test speech unit segment and calculating at least one cutting point fine adjustment value corresponding to each test speech unit segment; and
      
      integrating the initial cutting point and the cutting point fine adjustment value of the test speech unit segment to obtain a cutting point of the test speech unit segment.
  - 12. The system as claimed in claim 11, wherein the feature factor of the test speech unit segment is a neighboring cutting point of the initial cutting point.
  - 13. The system as claimed in claim 11, wherein the feature factor of the test speech unit segment is a zero crossing rate (ZCR) of the test speech unit segment.
  - 14. The system as claimed in claim 11, wherein the feature factor of the test speech unit segment is an energy value of the test speech unit segment.
  - 15. The system as claimed in claim 14, wherein the energy value is an energy value of a band pass signal and a high pass signal retrieved from a speaker-dependent band.
  - 16. The system as claimed in claim 11, wherein each cutting point fine adjustment value has a weighted value, and the cutting point of the test speech unit segment is a weighted average of the initial cutting point and the cutting point fine adjustment value.

17. An automatic speech segmentation and verification system comprising:
- a database for storing a known text script and a recorded speech corpus corresponding to the known text script, and the known text script has phonetic information with N speech unit segment wherein N is a positive integer;
  
  a speech unit segmentor for segmenting the recorded speech corpus into N test speech unit segments referring to the phonetic information of the known text script;
  
  a segmental verifier for verifying segment confidence measures of all cutting points of the N test speech unit segments to determine whether the cutting points of the N test speech unit segments are correct;
  
  a phonetic verifier for obtaining a confidence measure of segment verification by using verification models for verifying whether the recorded speech corpus is correctly recorded; and
  
  a speech unit inspector for integrating the confidence measure of speech unit segmentation and the confidence measure of segment verification to determine whether the test speech unit segment is accepted,wherein each segment confidence measure of the test speech unit segment is determined by;
  
  $CMS = \max (1 - h (D) - \sum_{s, f}^{} g (c (s), f (s)), 0),$ where $h (D) = K (\sum_{i}^{} w_{i} \langle d_{i} - \overline{d} \rangle),$ D is the vector of multiple expert decisions of the cutting point, d_iis the cutting point, d=p(D) is a final decision of the cutting point, K(x) is a monotonically increasing function that maps a non-negative variable x into a value between 0 and 1, g(c(s), f(s)) is a cost function value between a cost function ranging from 0 to 1, s is a segment, c(s) is the type category of the segment s and, f(s) is the acoustic feature of the segment.

18. An automatic speech segmentation and verification system comprising:
- a database for storing a known text script and a recorded speech corpus corresponding to the known text script, and the known text script has phonetic information with N speech unit segment wherein N is a positive integer;
  
  a speech unit segmentor for segmenting the recorded speech corpus into N test speech unit segments referring to the phonetic information of the known text script;
  
  a segmental verifier for verifying segment confidence measures of all cutting points of the N test speech unit segments to determine whether the cutting points of the N test speech unit segments are correct;
  
  a phonetic verifier for obtaining a confidence measure of segment verification by using verification models for verifying whether the recorded speech corpus is correctly recorded; and
  
  a speech unit inspector for integrating the confidence measure of speech unit segmentation and the confidence measure of segment verification to determine whether the test speech unit segment is accepted,wherein each phonetic confidence measure of the test speech unit segments is determined by;
  
  CMV=min{LLR_I, LLR_F, 0}, $where {\begin{matrix} {LLR}_{I} = \log P (X_{I} | H_{0}) - \log P (X_{I} | H_{1}) \\ {LLR}_{F} = \log P (X_{F} | H_{0}) - \log P (X_{F} | H_{1}) \end{matrix},$ X_Iis initial segment of the test speech unit segment, X_Fis final segment of the test speech unit segment, H₀is a null hypothesis of the test speech unit segment recorded correctly, H₁is an alternative hypothesis of the test speech unit segment recorded incorrectly, and LLR is a log likelihood ratio.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Industrial Technology Research Institute
Original Assignee
Industrial Technology Research Institute
Inventors
Kuo, Chih-Chung, Kuo, Chi-Shiang, Chen, Jau-Hung
Primary Examiner(s)
Smits; Talivaldis Ivars
Assistant Examiner(s)
GODBOLD, DOUGLAS

Application Number

US10/782,955
Publication Number

US 20050060151A1
Time in Patent Office

1,772 Days
Field of Search

704/254, 704/258, 704/266, 704/9, 704/10, 704/260, 704/267
US Class Current

704/266
CPC Class Codes

G10L 13/06 Elementary speech units use...

G10L 15/04 Segmentation; Word boundary...

Automatic speech segmentation and verification using segment confidence measures

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

30 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Automatic speech segmentation and verification using segment confidence measures

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

30 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links