Speech unit selection using HMM acoustic models

US 20080059190A1
Filed: 08/22/2006
Published: 03/06/2008
Est. Priority Date: 08/22/2006
Status: Abandoned Application

First Claim

Patent Images

1. A method for selecting speech units in a concatenative speech synthesizer comprising:

obtaining a representative measure indicative of a difference between HMM acoustic models of speech unitsselecting a speech unit to be used by a speech synthesizer based on the representative measure.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A concatenating speech synthesizer concatenates selected speech units to obtain the desired synthesized speech. When desired speech units of phonetic and/or prosodic context are not available, the synthesizer selects replacement speech units based on measures representative of the difference between the HMM acoustic models of the desired speech unit and available speech units.

296 Citations

20 Claims

1. A method for selecting speech units in a concatenative speech synthesizer comprising:
- obtaining a representative measure indicative of a difference between HMM acoustic models of speech unitsselecting a speech unit to be used by a speech synthesizer based on the representative measure.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1 wherein obtaining the representative measure comprises obtaining the representative measure indicative of the difference between acoustic models of speech units in different phonetic context.
  - 3. The method of claim 2 wherein the phonetic context is based on a preceding speech unit.
  - 4. The method of claim 2 wherein the phonetic context is based on a succeeding speech unit.
  - 5. The method of claim 1 wherein obtaining the representative measure comprises obtaining the representative measure indicative of the difference between acoustic models of speech units in different prosodic context.
  - 6. The method of claim 5 wherein the prosodic context is based on position of the speech unit in a word.
  - 7. The method of claim 5 wherein the prosodic context is based on position of the speech unit in a syllable of a word.
  - 8. The method of claim 5 wherein the prosodic context is based on accent status of the speech unit in a word.
  - 9. The method of claim 5 wherein the prosodic context is based on position of a word in a phrase.
  - 10. The method of claim 5 wherein the prosodic context is based on emphasis status of a word in a phrase.
  - 11. The method of claim 1 wherein obtaining the representative measure indicative of the difference between HMM acoustic models of speech units is based on calculating Kullback-Leibler Divergence between the HMM acoustic models.

12. A method of synthesizing speech comprising:
- receiving input text and parsing the input text to obtain phonetic one or both prosodic information;
  
  generating context vectors based on the phonetic one or both prosodic information;
  
  generating cost measures corresponding to the context vectors, the cost measures being based on a comparison of acoustic HMM models of speech units;
  
  selecting one or more speech units based on the context vectors and corresponding cost measures when speech units having desired context vectors are not available;
  
  concatenating the one or more selected speech units to form a synthesized speech output representing the input text.
- View Dependent Claims (13, 14, 15, 16)
- - 13. The method of claim 12 wherein the cost measures are indicative of a comparison based on phonetic features.
  - 14. The method of claim 13 wherein the cost measures are indicative of a comparison based on prosodic features.
  - 15. The method of claim 12 wherein the cost measures are indicative of a comparison based on prosodic features.
  - 16. The method of claim 12 wherein the cost measures are speech unit dependent.

17. A speech synthesizer comprising:
- a store of speech units indicative of at least one of different phonetic and different prosodic contexts;
  
  a set of cost measures associated with the speech units of the store of speech units, the cost measures being indicative of a comparison of acoustic HMM models of speech units of said at least one of different phonetic and different prosodic contexts; and
  
  a speech unit locator configured to select speech units to be used for forming synthesized speech based on accessing the set of cost measures when desired speech units of at least one of phonetic and prosodic contexts are not available in the store of speech units.
- View Dependent Claims (18, 19, 20)
- - 18. The synthesizer of claim 17 wherein each cost measure of the set of cost measures comprise a Kullback-Leibler Divergence between the HMM acoustic models.
  - 19. The synthesizer of claim 17 wherein the set of cost measures are speech unit dependent.
  - 20. The synthesizer of claim 18 wherein a sub-set of cost measures pertain to a plurality of speech units.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Zhao, Yong, Liu, Peng, Chu, Min, Li, Yusheng

Application Number

US11/508,093
Publication Number

US 20080059190A1
Time in Patent Office

Days
Field of Search
US Class Current

704/258
CPC Class Codes

G10L 13/06 Elementary speech units use...

Speech unit selection using HMM acoustic models

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

296 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Speech unit selection using HMM acoustic models

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

296 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links