Defining atom units between phone and syllable for TTS systems

US 7,418,389 B2
Filed: 01/11/2005
Issued: 08/26/2008
Est. Priority Date: 01/11/2005
Status: Expired due to Fees

First Claim

Patent Images

1. A method of developing a unit inventory for use by a text to speech system, comprising:

identifying a list of phones for a target language;

receiving a lexicon containing phonetic transcriptions of a plurality of words having a plurality of syllables;

identifying a set of common multi-phone atom units for the lexicon by;

decomposing each syllable into a plurality of slices;

identifying non-common slices within the plurality of slices; and

decomposing the non-common slices according to predetermined set of rules;

adding the set of common multi-phone atom units to the unit inventory for the target language; and

wherein if the predetermined rules are unable to decompose the non-common slice, then;

adding the slice to the unit inventory.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for identifying common multiphone units to add to a unit inventory for a text-to-speech generator is disclosed. The common multiphone units are units that are larger than a phone, but smaller than a syllable. The method slices each syllable into a plurality of slices. These slices are then sorted and the frequency of each slice is determined. Those slices whose frequencies exceed a threshold are added to the unit inventory. The remaining slices are decomposed according to a predetermined set of rules to determine if they contain slices that should be added to the unit inventory.

227 Citations

13 Claims

1. A method of developing a unit inventory for use by a text to speech system, comprising:
- identifying a list of phones for a target language;
  
  receiving a lexicon containing phonetic transcriptions of a plurality of words having a plurality of syllables;
  
  identifying a set of common multi-phone atom units for the lexicon by;
  
  decomposing each syllable into a plurality of slices;
  
  identifying non-common slices within the plurality of slices; and
  
  decomposing the non-common slices according to predetermined set of rules;
  
  adding the set of common multi-phone atom units to the unit inventory for the target language; and
  
  wherein if the predetermined rules are unable to decompose the non-common slice, then;
  
  adding the slice to the unit inventory.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1 wherein identifying the non-common slices within the plurality of slices comprises:
    - sorting the plurality of slices in order of frequency of occurrence;
      
      selecting as the non-common slices those slices in the plurality of slices having a frequency of occurrence in the lexicon below a threshold value.
  - 3. The method of claim 2 wherein the threshold value is 12.
  - 4. The method of claim 1 wherein decomposing the non-common slices comprises:
    - removing at least one phone from the non-common slice to generate a first new slice; and
      
      determining if the first new slice matches one of an existing phone or common multi-phone in the unit inventory.
  - 5. The method of claim 4 wherein if the first new slice does not match with an existing phone or common multi-phone in the unit inventory further executing the steps of:
    - decomposing the first new slice according the predetermined set of rules to generate a second new slice;
      
      determining if the second new slice is the same as the first new slice;
      
      if the second new slice is the same as the first new slice, then;
      
      adding the second new slice to the unit inventory;
      
      if the second new slice is not the same as the first new slice, then;
      
      determining whether the second new slice matches one of the existing phones or common multi-phones in the lexicon; and
      
      if the second new slice does not match one of the existing phones or common multi-phones in the lexicon, then;
      
      repeating the decomposing step.
  - 6. The method of claim 4 further comprising:
    - after removing the phone from the slice, adding the removed phone to a neighboring slice.
  - 7. The method of claim 1 wherein decomposing the syllable into a plurality of slices comprises:
    - breaking the syllable into three slices.
  - 8. The method of claim 7 wherein the three slices represent an onset slice, a nucleus slice and a coda slice, and wherein at least one of the three slices is a multiphone slice that is sized between a phone and a syllable.
  - 9. The method of claim 1 wherein the predetermined rules are based upon phonetic and phonological statistics for the target language.

10. An apparatus for generating speech from text, comprising:
- a unit inventory for storing a set of phoneme based atom units for at least one Target speaker, said set of phoneme based atom units being a plurality of different sizes and including only units limited to sizes greater than a phone but less than a syllable;
  
  a text analyzer for obtaining a string of phonetic symbols representative of a text to be converted to speech; and
  
  a concatenation module for selecting stored phoneme-based atom units to generate speech corresponding to the text,wherein the set of atom units comprises atom units that are determined to be common multi-phonal units for the target language;
  
  wherein the set of atom units includes atom units that are not common to the target language, but were unable to be decomposed according to a predetermined set of rules to match an entry already in the unit inventory.
- View Dependent Claims (11, 12)
- - 11. The apparatus of claim 10 wherein the set of phoneme-based atom units includes a complete set of monophones for the target language.
  - 12. The apparatus of claim 10 wherein the set of phoneme-based atom units sized between a phone and a syllable are representative of common multiphone units in the target language.

13. A unit inventory for use in text-to-speech generation, comprising:
- a set of monophone units for a target language;
  
  a set of atom units sized between a phone and a syllable, for the target language;
  
  wherein the set of atom units comprises atom units that are determined to be common multiphonal units for the target language;
  
  wherein the set of atom units includes atom units that are not common to the target language, but were unable to be decomposed according to a predetermined set of rules to match an entry already in the unit inventory.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Zhao, Yong, Chu, Min
Primary Examiner(s)
Smits; Talivaldis Ivars
Assistant Examiner(s)
HERNANDEZ, JOSIAH J

Application Number

US11/033,075
Publication Number

US 20060155544A1
Time in Patent Office

1,323 Days
Field of Search

704/267, 704/260
US Class Current

704/267
CPC Class Codes

G10L 13/08 Text analysis or generation...

Defining atom units between phone and syllable for TTS systems

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

227 Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Defining atom units between phone and syllable for TTS systems

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

227 Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links