Automatic segmentation in speech synthesis

US 8,131,547 B2
Filed: 08/20/2009
Issued: 03/06/2012
Est. Priority Date: 03/29/2002
Status: Expired due to Fees

First Claim

Patent Images

1. A method for automatic segmentation of speech to generate a speech inventory, the method comprising:

initializing, via a processor, a Hidden Markov Model (HMM) using seed input data;

performing a segmentation of the HMM into speech units to generate phone labels;

correcting, via the processor, the segmentation of the speech units by performing the steps;

re-estimating the HMM based on a current version of the phone labels;

embedded re-estimating of the HMM; and

updating the current version of the phone labels using spectral boundary correction.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system are disclosed that automatically segment speech to generate a speech inventory. The method includes initializing a Hidden Markov Model (HMM) using seed input data, performing a segmentation of the HMM into speech units to generate phone labels, correcting the segmentation of the speech units. Correcting the segmentation of the speech units includes re-estimating the HMM based on a current version of the phone labels, embedded re-estimating of the HMM, and updating the current version of the phone labels using spectral boundary correction. The system includes modules configured to control a processor to perform steps of the method.

Citations

20 Claims

1. A method for automatic segmentation of speech to generate a speech inventory, the method comprising:
- initializing, via a processor, a Hidden Markov Model (HMM) using seed input data;
  
  performing a segmentation of the HMM into speech units to generate phone labels;
  
  correcting, via the processor, the segmentation of the speech units by performing the steps;
  
  re-estimating the HMM based on a current version of the phone labels;
  
  embedded re-estimating of the HMM; and
  
  updating the current version of the phone labels using spectral boundary correction.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, further comprising concatenating the speech units to synthesize speech.
  - 3. The method of claim 2, further comprising iteratively performing the re-estimating, embedded re-estimating, and updating steps until no perceptual improvement of synthesis quality is detected between iterations.
  - 4. The method of claim 1, wherein the seed input data is selected from the group consisting of hand-labeled bootstrapped data, speaker-independent HMM bootstrapped data, and flat start data.
  - 5. The method of claim 1, further comprising adjusting boundaries of the phone labels within specified time windows.
  - 6. The method of claim 1, further comprising identifying context-dependent time windows around speech unit boundaries, wherein the speech unit boundaries include one or more of:
    - a vowel-to-vowel boundary;
      
      a vowel-to-nasal boundary;
      
      a vowel-to-voiced stop boundary;
      
      a vowel-to-liquid boundary;
      
      a vowel-to-unvoiced stop boundary;
      
      a vowel-to-voiced fricative boundary;
      
      an unvoiced stop-to-vowel boundary;
      
      a nasal-to-vowel boundary;
      
      a voiced stop-to-vowel boundarya liquid-to-vowel boundary;
      
      an unvoiced fricative-to-vowel boundary; and
      
      a voiced fricative-to-vowel boundary.
  - 7. The method of claim 6, wherein the context-dependent time windows are empirically determined by adjacent phones.

8. A computer-readable storage medium storing a set of program instructions executable on a processor device and usable to reduce speech unit boundaries, the instructions causing the processing device to perform the steps:
- aligning a trained set of HMMs to produce phone labels that are segmented, wherein each phone label has a spectral boundary;
  
  performing a spectral boundary correction on the phone labels, wherein spectral boundary correction re-aligns each spectral boundary using bending points of spectral transitions; and
  
  synthesizing speech using the phone labels having spectral boundary correction.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16)
- - 9. The computer-readable storage medium of claim 8, wherein the instructions further comprise bootstrapping the set of HMMs with at least one of speaker-dependent HMMs and speaker-independent HMMs.
  - 10. The computer-readable storage medium of claim 8, wherein the instructions further comprise:
    - initializing the set of HMMs;
      
      re-estimating the set of HMMs; and
      
      performing embedded re-estimation on the set of HMMs.
  - 11. The computer-readable storage medium of claim 10, wherein the instructions further comprise iteratively performing a first alignment on a trained set of HMMs to produce phone labels that are segmented and performing spectral boundary correction on the phone labels.
  - 12. The computer-readable storage medium of claim 11, wherein the instructions further comprise training the set of HMMs using phone labels having boundaries that have been re-aligned using spectral boundary correction.
  - 13. The computer-readable storage medium of claim 8, wherein the instruction further comprise performing a Viterbi alignment on the trained set of HMMs to produce phone labels that are segmented.
  - 14. The computer-readable storage medium of claim 8, wherein the instructions further comprise performing spectral boundary correction on the phone labels within a context-dependent time window.
  - 15. The computer-readable storage medium of claim 14, wherein the instructions further comprise determining empirically the context-dependent time window using adjacent phones.
  - 16. The computer-readable storage medium of claim 8, wherein each spectral boundary is between a first phone class and a second phone class.

17. A system for automatic segmentation of speech to generate a speech inventory, the system comprising:
- a processor;
  
  a first module configured to control the processor to initialize a Hidden Markov Model (HMM) using seed input data;
  
  a second module configured to control the processor to perform a segmentation of the HMM into speech units to generate phone labels;
  
  a third module configured to control the processor to correct the segmentation of the speech units by performing the steps;
  
  re-estimating the HMM based on a current version of the phone labels;
  
  embedded re-estimating of the HMM; and
  
  updating the current version of the phone labels using spectral boundary correction.
- View Dependent Claims (18, 19, 20)
- - 18. The system of claim 17, further comprising a module configured to control the processor to concatenate the speech units to synthesize speech.
  - 19. The system of claim 18, further comprising a module configured to control the processor to iteratively perform the re-estimating, embedded re-estimating, and updating steps until no perceptual improvement of synthesis quality is detected between iterations.
  - 20. The system of claim 17, wherein the seed input data is selected from the group consisting of hand-labeled bootstrapped data, speaker-independent HMM bootstrapped data, and flat start data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Inventors
Conkie, Alistair D., Kim, Yeon-Jun
Primary Examiner(s)
Chawan, Vijay B

Application Number

US12/544,576
Publication Number

US 20090313025A1
Time in Patent Office

929 Days
Field of Search

704/256, 704/254, 704/246, 704/232, 704/258, 704/255, 704/245, 704/202, 704/240, 704/253, 704/256.1, 704/256.2
US Class Current

704/256
CPC Class Codes

G10L 13/06 Elementary speech units use...

Automatic segmentation in speech synthesis

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Automatic segmentation in speech synthesis

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links