Method for letter-to-sound in text-to-speech synthesis

US 6,029,132 A
Filed: 04/30/1998
Issued: 02/22/2000
Est. Priority Date: 04/30/1998
Status: Expired due to Fees

First Claim

Patent Images

1. An apparatus for generating at least one phonetic pronunciation for an input sequence of letters selected from a predetermined alphabet, said sequence of letters forming words which substantially adhere to a predetermined syntax, said apparatus comprising:

an input device for receiving syntax data indicative of the syntax of said words in said input sequence;

a computer storage device for storing a plurality of text-based decision trees having questions indicative of predetermined characteristics of said input sequence;

said predetermined characteristics including letter-related questions about said input sequence, said predetermined characteristics also including characteristics selected from the group consisting of syntax-related questions, context-related questions, dialect-related questions or combinations thereof,said text-based decision trees having internal nodes representing questions about predetermined characteristics of said input sequence;

said text-based decision trees further having leaf nodes representing probability data that associates each of said letters with a plurality of phoneme pronunciations; and

a text-based pronunciation generator connected to said text-based decision trees for processing said input sequence of letters and generating a first set of phonetic pronunciations corresponding to said input sequence of letters based upon said text-based decision trees.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A two-stage pronunciation generator utilizes mixed decision trees that includes a network of yes-no questions about letter, syntax, context, and dialect in a spelled word sequence. A second stage utilizes decision trees that includes a network of yes-no questions about adjacent phonemes in the phoneme sequence corresponding to the spelled word sequence. Leaf nodes of the mixed decision trees provide information about which phonetic transcriptions are most probable. Using the mixed trees, scores are developed for each of a plurality of possible pronunciations, and these scores can be used to select the best pronunciation as well as to rank pronunciations in order of probability. The pronunciations generated by the system can be used in speech synthesis and speech recognition applications as well as lexicography applications.

317 Citations

34 Claims

1. An apparatus for generating at least one phonetic pronunciation for an input sequence of letters selected from a predetermined alphabet, said sequence of letters forming words which substantially adhere to a predetermined syntax, said apparatus comprising:
- an input device for receiving syntax data indicative of the syntax of said words in said input sequence;
  
  a computer storage device for storing a plurality of text-based decision trees having questions indicative of predetermined characteristics of said input sequence;
  
  said predetermined characteristics including letter-related questions about said input sequence, said predetermined characteristics also including characteristics selected from the group consisting of syntax-related questions, context-related questions, dialect-related questions or combinations thereof,said text-based decision trees having internal nodes representing questions about predetermined characteristics of said input sequence;
  
  said text-based decision trees further having leaf nodes representing probability data that associates each of said letters with a plurality of phoneme pronunciations; and
  
  a text-based pronunciation generator connected to said text-based decision trees for processing said input sequence of letters and generating a first set of phonetic pronunciations corresponding to said input sequence of letters based upon said text-based decision trees.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 2. The apparatus of claim 1 further comprising:
    - a phoneme-mixed tree score estimator connected to said text-based pronunciation generator for processing said first set to generate a second set of scored phonetic pronunciations, the scored phonetic pronunciations representing at least one phonetic pronunciation of said input sequence.
  - 3. The apparatus of claim 2 further comprising:
    - a plurality of phoneme-mixed decision trees having a first plurality of internal nodes representing questions about said predetermined characteristics and having a second plurality of internal nodes representing questions about a phoneme and its neighboring phonemes in said given sequence,said phoneme-mixed decision trees further having leaf nodes representing probability data that associates said given letter with a plurality of phoneme pronunciations;
      
      said phoneme-mixed tree score estimator being connected to said phoneme-mixed decision trees for generating said second set of scored phonetic pronunciations.
  - 4. The apparatus of claim 3 wherein said second set includes a plurality of pronunciations each with an associated score derived from said probability data and further comprising a pronunciation selector receptive of said second set and operable to select one pronunciation from said second set based on said associated score.
  - 5. The apparatus of claim 3 wherein said phoneme-mixed tree score estimator rescores said n-best pronunciations based on said phoneme-mixed decision trees.
  - 6. The apparatus of claim 1 wherein said text-based pronunciation generator produces a predetermined number of different pronunciations corresponding to a given input sequence.
  - 7. The apparatus of claim 1 wherein said text-based pronunciation generator produces a predetermined number of different pronunciations corresponding to a given input sequence and representing the n-best pronunciations according to said probability data.
  - 8. The apparatus of claim 1 wherein said phoneme-mixed tree score estimator constructs a matrix of possible phoneme combinations representing different pronunciations.
  - 9. The apparatus of claim 8 wherein said phoneme-mixed tree score estimator selects the n-best phoneme combinations from said matrix using dynamic programming.
  - 10. The apparatus of claim 8 wherein said phoneme-mixed tree score estimator selects the n-best phoneme combinations from said matrix by iterative substitution.
  - 11. The apparatus of claim 3 further comprising a speech recognition system having a pronunciation dictionary used for recognizer training and wherein at least a portion of said second set populates said dictionary to supply pronunciations for words based on their spelling.
  - 12. The apparatus of claim 3 further comprising a speech synthesis system receptive of at least a portion of said second set for generating an audible synthesized pronunciation of words based on their spelling.
  - 13. The apparatus of claim 12 wherein said speech synthesis system is incorporated into an e-mail reader.
  - 14. The apparatus of claim 12 wherein said speech synthesis system is incorporated into a dictionary for providing a list of possible pronunciations in order of probability.
  - 15. The apparatus of claim 1 further comprising:
    - a language learning system that displays a spelled sentence and analyzes a speaker'"'"'s attempt at pronouncing that sentence using at least one of said text-based trees and one of said phoneme-mixed decision trees to indicate to the speaker how probable the speaker'"'"'s pronunciation was for that sentence.
  - 16. The apparatus of claim 1 further comprising:
    - a syntax tagger module connected to said input device for associating syntax-indicative data to the words of the input sequence in order to generate said syntax data.

17. A method for generating at least one phonetic pronunciation for an input sequence of letters selected from a predetermined alphabet, said sequence of letters forming words which substantially adhere to a predetermined syntax, comprising the steps of:
- receiving syntax data indicative of the syntax of said words in said input sequence;
  
  storing a plurality of text-based decision trees having questions indicative of predetermined characteristics of said input sequence,said predetermined characteristics including letter-related questions about said input sequence, said predetermined characteristics also including characteristics selected from the group consisting of syntax-related questions, context-related questions, dialect-related questions or combinations thereof,said text-based decision trees having internal nodes representing questions about said predetermined characteristics of said input sequence;
  
  said text-based decision trees further having leaf nodes representing probability data that associates each of said letters with a plurality of phoneme pronunciations; and
  
  processing said input sequence of letters in order to generate a first set of phonetic pronunciations corresponding to said input sequence of letters based upon said text-based decision trees.
- View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
- - 18. The method of claim 17 further comprising the step of:
    - generating rate data based upon context-related questions within said text-based decision trees, said rate data indicating the duration which words in a sentence are spoken.
  - 19. The method of claim 17 further comprising the step of:
    - processing said first set to generate a second set of scored phonetic pronunciations, said second set of scored phonetic pronunciations representing at least one phonetic pronunciation of said input sequence.
  - 20. The method of claim 19 further comprising the steps of:
    - providing a plurality of phoneme-mixed decision trees which have a first plurality of internal nodes representing questions about said predetermined characteristics and having a second plurality of internal nodes representing questions about a phoneme and its neighboring phonemes in said given sequence,said phoneme-mixed decision trees further having leaf nodes representing probability data that associates said given letter with a plurality of phoneme pronunciations;
      
      generating said second set of scored phonetic pronunciations using said phoneme-mixed decision trees.
  - 21. The method of claim 20 wherein said second set includes a plurality of pronunciations each with an associated score derived from said probability data, said method further comprising the step of:
    - selecting one pronunciation from said second set based on said associated score.
  - 22. The method of claim 20 further comprising the step of:
    - rescoring said n-best pronunciations based on said phoneme-mixed decision trees.
  - 23. The method of claim 17 further comprising the step of:
    - producing a predetermined number of different pronunciations corresponding to a given input sequence.
  - 24. The method of claim 17 further comprising the step of:
    - producing a predetermined number of different pronunciations corresponding to a given input sequence and representing the n-best pronunciations according to said probability data.
  - 25. The method of claim 17 further comprising the step of:
    - generating a matrix of possible phoneme combinations representing different pronunciations.
  - 26. The method of claim 25 further comprising the step of:
    - selecting the n-best phoneme combinations from said matrix using dynamic programming.
  - 27. The method of claim 25 further comprising the step of:
    - selecting the n-best phoneme combinations from said matrix by iterative substitution.
  - 28. The method of claim 20 further comprising the step of:
    - providing a speech recognition system having a pronunciation dictionary used for recognizer training and wherein at least a portion of said second set populates said dictionary to supply pronunciations for words based on their spelling.
  - 29. The method of claim 20 further comprising the step of:
    - providing a speech synthesis system receptive of at least a portion of said second set for generating an audible synthesized pronunciation of words based on their spelling.
  - 30. The method of claim 29 wherein said speech synthesis system is incorporated into an e-mail reader.
  - 31. The method of claim 29 wherein said speech synthesis system is incorporated into a dictionary for providing a list of possible pronunciations in order of probability.
  - 32. The method of claim 17 further comprising the step of:
    - providing a language learning system that displays a spelled sentence and analyzes a speaker'"'"'s attempt at pronouncing that sentence using at least one of said text-based trees and one of said phoneme-mixed decision trees to indicate to the speaker how probable the speaker'"'"'s pronunciation was for that sentence.
  - 33. The method of claim 17 further comprising the step of:
    - using a syntax tagger module for associating syntax-indicative data to the words of the input sequence in order to generate said syntax data.
  - 34. The method of claim 17 wherein said leaf nodes of said text-based decision trees includes stress indicative data associated with said phoneme pronunciations.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Matsushita Electric Industrial Company Limited (Panasonic Holdings Corporation)
Original Assignee
Matsushita Electric Industrial Co. (Panasonic Holdings Corporation)
Inventors
Junqua, Jean-Claude, Kuhn, Roland
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
ABEBE, DANIEL DEMELASH

Application Number

US09/070,300
Time in Patent Office

663 Days
Field of Search

704/260, 704/258, 704/259
US Class Current

704/260
CPC Class Codes

G10L 13/08 Text analysis or generation...

Method for letter-to-sound in text-to-speech synthesis

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

317 Citations

34 Claims

Specification

Solutions

Use Cases

Quick Links

Method for letter-to-sound in text-to-speech synthesis

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

317 Citations

34 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links