Rule-based learning of word pronunciations from training corpora
First Claim
Patent Images
1. A text-to-pronunciation system comprising:
- a large training set of word pronunciations, and means coupled to said large training set of word pronunciations for extracting language specific information from said large training set of word pronunciations to produce pronunciations for words not in said large training set of word pronunciations;
said means includes a learner for creating pronunciation guesses for words in said large training set of word pronunciations and for finding a transformation rule that improves said word pronunciation guesses and a rule applier for applying said transformation rule found to improve said word pronunciation guesses.
1 Assignment
0 Petitions
Accused Products
Abstract
A text-to-pronunciation system (11) includes a large training set of word pronunciations (19) and an extractor for extracting language specific information from the training set to produce pronunciations for words not in its training set. A learner (13) forms pronunciation guesses for words in the training set and for finding a transformation rule that improves the guesses. A rule applier (15) applies the transformation rule found to guesses. The learner (13) repeats the finding of another rule and the rule applier (15) applies the new rule to find the rules that improves the guesses the most.
-
Citations
18 Claims
-
1. A text-to-pronunciation system comprising:
- a large training set of word pronunciations, and means coupled to said large training set of word pronunciations for extracting language specific information from said large training set of word pronunciations to produce pronunciations for words not in said large training set of word pronunciations;
said means includes a learner for creating pronunciation guesses for words in said large training set of word pronunciations and for finding a transformation rule that improves said word pronunciation guesses and a rule applier for applying said transformation rule found to improve said word pronunciation guesses. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- a large training set of word pronunciations, and means coupled to said large training set of word pronunciations for extracting language specific information from said large training set of word pronunciations to produce pronunciations for words not in said large training set of word pronunciations;
-
15. A method of text-to-pronunciation comprising the steps of:
-
providing a training set of pronunciations;
extracting language specific information from said training set of pronunciations;
generating grapheme-phoneme guesses for every word in said training set using said language specific information;
finding a transformation rule that improves said guesses; and
applying said transformation rule to all guesses, finding another rule to all guesses and repeating said finding and applying steps to find the transformation rule that improves said guesses the most. - View Dependent Claims (16)
-
-
17. A method of text-to-pronunciation comprising the steps of:
-
learning pronunciation rules by finding initial approximations for the grapheme-phoneme conditional probabilities using a slice algorithm then finding more precise grapheme-phoneme conditional probabilities using iterative realignment of grapheme and phoneme sequences;
aligning individual phonemes with graphemes in a training set;
proposing initial word pronunciation guesses for each word;
finding transformations that bring proposed word pronunciations closer to word pronunciations in said training set by initially aligning phonemes with the letter for which the phonemes were generated, aligning proposed pronunciations guesses with word pronunciations in said training set as well as necessary corrections marked and finally aligning the word pronunciations in the training set with the spelling by aligning individual phonemes with graphemes in the words;
finding pronunciation rules that bring the proposed pronunciation guesses closer to the word pronunciations in the training set;
applying one of the rules found to get new word pronunciation guesses;
comparing proposed word pronunciation guesses by aligning them with the word pronunciations in the training set to find necessary phoneme insertions, deletions and substitutions and propose rules at each place where pronunciations deviate from each other;
scoring every mismatch between pronunciations in said training set and proposed pronunciation guesses to produce a score representing error;
ranking rules by how much the rules improve the score;
selecting the rules that improves the score the most;
applying rules whereby for substitutions the phoneme is changed and for deletions is removed if the phoneme is aligned to the specific grapheme and has the specified phoneme or grapheme context, for insertion the inserted phoneme is assigned to the grapheme on the right except when the insertion is at the end of a word if the specific context is in the word and the phoneme is assigned to the grapheme which is halfway between the grapheme aligned to the neighboring phonemes if the context was a phoneme context and if the insertion is at one of the ends of the word if the phoneme is assigned to grapheme at the edge;
storing the initial phoneme sequences for letters and the transformation rules; and
generating word pronunciations by initially transcribing each letter and applying the transformation rules in sequence.
-
-
18. A method of text-to-pronunciation comprising the steps of:
-
learning pronunciation rules by finding initial approximations for the grapheme-phoneme conditional probabilities using a slice algorithm using contrastive word pairs;
finding more precise grapheme-phoneme conditional probabilities using iterative realignment of grapheme and phoneme sequences;
preparing for learning by aligning individual phonemes in a training set;
initializing for learning by proposing initial word pronunciation guesses for each word;
learning process to find transformations that bring proposed word pronunciation guesses closer to word pronunciations in said training set by initially aligning phonemes with the letter for which the phonemes are generated, aligning pronunciation guesses with word pronunciations in said training set as well as necessary corrections marked and finally aligning the word pronunciations in the training set with the spelling by aligning individual phonemes with graphemes in the words;
finding pronunciation rules using a data-driven approach that brings the proposed pronunciation guesses closer to the word pronunciations in the training set by looking for rules that bring proposed pronunciation guesses closer to the word pronunciations in the training set and then applying one of the rules found to get to new pronunciation guesses, the proposed pronunciation guesses are compared by aligning proposed solutions with the word pronunciations in the training set to find necessary phoneme insertions, deletions or substitutions and propose rules at each place where pronunciations deviate from each other to correct the mistake;
enumerating every mismatch between word pronunciations in the training set and proposed pronunciation guesses to produce a score representing the error between the pronunciations which is the weighted string distance using a dynamic aligner;
augmenting the scoring with weighted phonetic distances where weights are assigned to each feature deemed important and the distances in feature space as an error are added to the phoneme substitutional penalties;
ranking pronunciation rules by how much they improve the scores;
selecting the rule that improves the score the most and if several rules cause the same improvement select the one that uses smaller context and uses phoneme context as opposed to grapheme context, and a rule not applied once before;
applying rules going from left to right starting at the left most position and for substitutions and deletions the phoneme is changed for substitutions or removed for deletions if the phoneme is aligned to the specified grapheme and has the specified phoneme or grapheme context, and for insertion a check is made to determine if the specific context is in the word and if the context was a grapheme context, the inserted phoneme is assigned to the grapheme on the right except when the insertion is at the end of the word and in that case assigned to the last letter and if the context was a phoneme context the phoneme is assigned to the grapheme which is halfway between the graphemes aligned to the neighboring phonemes and if the insertion is at one end of the ends of the words, the phoneme is assigned to the grapheme on the edge;
storing the initial phoneme sequences for letters and the transformation rules; and
generating word pronunciations by initially transcribing each letter and then applying the transformation rules in sequence.
-
Specification