Method for letter-to-sound in text-to-speech synthesis
First Claim
1. An apparatus for generating at least one phonetic pronunciation for an input sequence of letters selected from a predetermined alphabet, said sequence of letters forming words which substantially adhere to a predetermined syntax, said apparatus comprising:
- an input device for receiving syntax data indicative of the syntax of said words in said input sequence;
a computer storage device for storing a plurality of text-based decision trees having questions indicative of predetermined characteristics of said input sequence;
said predetermined characteristics including letter-related questions about said input sequence, said predetermined characteristics also including characteristics selected from the group consisting of syntax-related questions, context-related questions, dialect-related questions or combinations thereof,said text-based decision trees having internal nodes representing questions about predetermined characteristics of said input sequence;
said text-based decision trees further having leaf nodes representing probability data that associates each of said letters with a plurality of phoneme pronunciations; and
a text-based pronunciation generator connected to said text-based decision trees for processing said input sequence of letters and generating a first set of phonetic pronunciations corresponding to said input sequence of letters based upon said text-based decision trees.
1 Assignment
0 Petitions
Accused Products
Abstract
A two-stage pronunciation generator utilizes mixed decision trees that includes a network of yes-no questions about letter, syntax, context, and dialect in a spelled word sequence. A second stage utilizes decision trees that includes a network of yes-no questions about adjacent phonemes in the phoneme sequence corresponding to the spelled word sequence. Leaf nodes of the mixed decision trees provide information about which phonetic transcriptions are most probable. Using the mixed trees, scores are developed for each of a plurality of possible pronunciations, and these scores can be used to select the best pronunciation as well as to rank pronunciations in order of probability. The pronunciations generated by the system can be used in speech synthesis and speech recognition applications as well as lexicography applications.
317 Citations
34 Claims
-
1. An apparatus for generating at least one phonetic pronunciation for an input sequence of letters selected from a predetermined alphabet, said sequence of letters forming words which substantially adhere to a predetermined syntax, said apparatus comprising:
-
an input device for receiving syntax data indicative of the syntax of said words in said input sequence; a computer storage device for storing a plurality of text-based decision trees having questions indicative of predetermined characteristics of said input sequence;
said predetermined characteristics including letter-related questions about said input sequence, said predetermined characteristics also including characteristics selected from the group consisting of syntax-related questions, context-related questions, dialect-related questions or combinations thereof,said text-based decision trees having internal nodes representing questions about predetermined characteristics of said input sequence; said text-based decision trees further having leaf nodes representing probability data that associates each of said letters with a plurality of phoneme pronunciations; and a text-based pronunciation generator connected to said text-based decision trees for processing said input sequence of letters and generating a first set of phonetic pronunciations corresponding to said input sequence of letters based upon said text-based decision trees. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A method for generating at least one phonetic pronunciation for an input sequence of letters selected from a predetermined alphabet, said sequence of letters forming words which substantially adhere to a predetermined syntax, comprising the steps of:
-
receiving syntax data indicative of the syntax of said words in said input sequence; storing a plurality of text-based decision trees having questions indicative of predetermined characteristics of said input sequence, said predetermined characteristics including letter-related questions about said input sequence, said predetermined characteristics also including characteristics selected from the group consisting of syntax-related questions, context-related questions, dialect-related questions or combinations thereof, said text-based decision trees having internal nodes representing questions about said predetermined characteristics of said input sequence; said text-based decision trees further having leaf nodes representing probability data that associates each of said letters with a plurality of phoneme pronunciations; and processing said input sequence of letters in order to generate a first set of phonetic pronunciations corresponding to said input sequence of letters based upon said text-based decision trees. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
-
Specification