Rule-based learning of word pronunciations from training corpora

US 6,411,932 B1
Filed: 06/08/1999
Issued: 06/25/2002
Est. Priority Date: 06/12/1998
Status: Expired due to Term

First Claim

Patent Images

1. A text-to-pronunciation system comprising:

a large training set of word pronunciations, and means coupled to said large training set of word pronunciations for extracting language specific information from said large training set of word pronunciations to produce pronunciations for words not in said large training set of word pronunciations;

said means includes a learner for creating pronunciation guesses for words in said large training set of word pronunciations and for finding a transformation rule that improves said word pronunciation guesses and a rule applier for applying said transformation rule found to improve said word pronunciation guesses.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A text-to-pronunciation system (11) includes a large training set of word pronunciations (19) and an extractor for extracting language specific information from the training set to produce pronunciations for words not in its training set. A learner (13) forms pronunciation guesses for words in the training set and for finding a transformation rule that improves the guesses. A rule applier (15) applies the transformation rule found to guesses. The learner (13) repeats the finding of another rule and the rule applier (15) applies the new rule to find the rules that improves the guesses the most.

Citations

18 Claims

1. A text-to-pronunciation system comprising:
- a large training set of word pronunciations, and means coupled to said large training set of word pronunciations for extracting language specific information from said large training set of word pronunciations to produce pronunciations for words not in said large training set of word pronunciations;
  
  said means includes a learner for creating pronunciation guesses for words in said large training set of word pronunciations and for finding a transformation rule that improves said word pronunciation guesses and a rule applier for applying said transformation rule found to improve said word pronunciation guesses.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The system of claim 1, wherein said learner finds other transformation rules that improve the pronunciation guesses and said rule applier applies said other transformation rules to find a transformation rule that improves the pronunciation guesses the most.
  - 3. The system of claim 2, including a rule scorer for scoring pronunciation guesses based on the difference between the word pronunciation guesses and the word pronunciation in said large training set to produce a transformation rule score, and a rule selector for selection from the transformation rules the transformation rule that improves the score the most.
  - 4. The system of claim 3, wherein said learner includes a slice algorithm with contrastive word pairs for finding initial approximations for the grapheme-phoneme conditional probabilities.
  - 5. The system of claim 4, wherein said learner includes an aligner for after finding the initial approximations using the slice algorithm for finding more precise grapheme-phoneme conditional probabilities using iterative realignment of grapheme-phoneme and phoneme using said aligner.
  - 6. The system of claim 2, wherein said learner includes a slice algorithm with contrastive word pairs for finding initial approximations for the grapheme-phoneme conditional probabilities.
  - 7. The system of claim 6, wherein said learner includes an aligner for after finding the initial approximations using the slice algorithm for finding more precise grapheme-phoneme conditional probabilities using iterative realignment of grapheme-phoneme and phoneme using said aligner.
  - 8. The system of claim 7, wherein said rule applier applies rules from left to right starting from the left most position and the phonemes are changed for substitution or removed for deletions if the phonemes are aligned to specific graphemes and have the specified phoneme or grapheme context.
  - 9. The system of claim 8, wherein said rule applier applies a rule for an insertion wherein a check is made to determine if the specific context is in the word and if the context was a grapheme context, the inserted phoneme is assigned to the grapheme on the right except when the insertion is at the end of the word and in that case is assigned to the last letter and if the context is a phoneme context the phoneme is assigned to the grapheme which is halfway between the graphemes aligned to the neighboring phonemes and if the insertion is at one end of the ends of the word, the phoneme is assigned to the grapheme on the edge.
  - 10. The system of claim 9, including a storage for storing initial phoneme queries for letters and the transformation rules.
  - 11. The system of claim 1, wherein said learner expands the transformation rule using feature-phonemes, said feature-phonemes map to any phoneme such that when the learner proposes rules to correct mistakes the rules are expanded so that all equivalent rules using feature-phonemes are proposed.
  - 12. The system of claim 11, wherein the phonemes in the rule are substituted with feature phonemes that stand for given phonemes and this substitution is done in all possible ways with proposed results.
  - 13. The system of claim 1, wherein said learner includes means for compound phonemes which is a generalization of phonemes that enables having multiple pronunciations with a system that generates really one pronunciation.
  - 14. The system of claim 13, wherein said compound phonemes encode either an optional phoneme, or a choice between two possible phonemes.

15. A method of text-to-pronunciation comprising the steps of:
- providing a training set of pronunciations;
  
  extracting language specific information from said training set of pronunciations;
  
  generating grapheme-phoneme guesses for every word in said training set using said language specific information;
  
  finding a transformation rule that improves said guesses; and
  
  applying said transformation rule to all guesses, finding another rule to all guesses and repeating said finding and applying steps to find the transformation rule that improves said guesses the most.
- View Dependent Claims (16)
- - 16. The method of claim 15, wherein said applying step includes enumerating every mismatch between word pronunciations in said training set and proposed pronunciations to produce a score representing the error between pronunciation and augmenting the scoring with weighted phonetic distances where weights are assigned to each feature deemed important and the distances in feature space as an error are added to the phoneme substitutional pennalties.

17. A method of text-to-pronunciation comprising the steps of:
- learning pronunciation rules by finding initial approximations for the grapheme-phoneme conditional probabilities using a slice algorithm then finding more precise grapheme-phoneme conditional probabilities using iterative realignment of grapheme and phoneme sequences;
  
  aligning individual phonemes with graphemes in a training set;
  
  proposing initial word pronunciation guesses for each word;
  
  finding transformations that bring proposed word pronunciations closer to word pronunciations in said training set by initially aligning phonemes with the letter for which the phonemes were generated, aligning proposed pronunciations guesses with word pronunciations in said training set as well as necessary corrections marked and finally aligning the word pronunciations in the training set with the spelling by aligning individual phonemes with graphemes in the words;
  
  finding pronunciation rules that bring the proposed pronunciation guesses closer to the word pronunciations in the training set;
  
  applying one of the rules found to get new word pronunciation guesses;
  
  comparing proposed word pronunciation guesses by aligning them with the word pronunciations in the training set to find necessary phoneme insertions, deletions and substitutions and propose rules at each place where pronunciations deviate from each other;
  
  scoring every mismatch between pronunciations in said training set and proposed pronunciation guesses to produce a score representing error;
  
  ranking rules by how much the rules improve the score;
  
  selecting the rules that improves the score the most;
  
  applying rules whereby for substitutions the phoneme is changed and for deletions is removed if the phoneme is aligned to the specific grapheme and has the specified phoneme or grapheme context, for insertion the inserted phoneme is assigned to the grapheme on the right except when the insertion is at the end of a word if the specific context is in the word and the phoneme is assigned to the grapheme which is halfway between the grapheme aligned to the neighboring phonemes if the context was a phoneme context and if the insertion is at one of the ends of the word if the phoneme is assigned to grapheme at the edge;
  
  storing the initial phoneme sequences for letters and the transformation rules; and
  
  generating word pronunciations by initially transcribing each letter and applying the transformation rules in sequence.

18. A method of text-to-pronunciation comprising the steps of:
- learning pronunciation rules by finding initial approximations for the grapheme-phoneme conditional probabilities using a slice algorithm using contrastive word pairs;
  
  finding more precise grapheme-phoneme conditional probabilities using iterative realignment of grapheme and phoneme sequences;
  
  preparing for learning by aligning individual phonemes in a training set;
  
  initializing for learning by proposing initial word pronunciation guesses for each word;
  
  learning process to find transformations that bring proposed word pronunciation guesses closer to word pronunciations in said training set by initially aligning phonemes with the letter for which the phonemes are generated, aligning pronunciation guesses with word pronunciations in said training set as well as necessary corrections marked and finally aligning the word pronunciations in the training set with the spelling by aligning individual phonemes with graphemes in the words;
  
  finding pronunciation rules using a data-driven approach that brings the proposed pronunciation guesses closer to the word pronunciations in the training set by looking for rules that bring proposed pronunciation guesses closer to the word pronunciations in the training set and then applying one of the rules found to get to new pronunciation guesses, the proposed pronunciation guesses are compared by aligning proposed solutions with the word pronunciations in the training set to find necessary phoneme insertions, deletions or substitutions and propose rules at each place where pronunciations deviate from each other to correct the mistake;
  
  enumerating every mismatch between word pronunciations in the training set and proposed pronunciation guesses to produce a score representing the error between the pronunciations which is the weighted string distance using a dynamic aligner;
  
  augmenting the scoring with weighted phonetic distances where weights are assigned to each feature deemed important and the distances in feature space as an error are added to the phoneme substitutional penalties;
  
  ranking pronunciation rules by how much they improve the scores;
  
  selecting the rule that improves the score the most and if several rules cause the same improvement select the one that uses smaller context and uses phoneme context as opposed to grapheme context, and a rule not applied once before;
  
  applying rules going from left to right starting at the left most position and for substitutions and deletions the phoneme is changed for substitutions or removed for deletions if the phoneme is aligned to the specified grapheme and has the specified phoneme or grapheme context, and for insertion a check is made to determine if the specific context is in the word and if the context was a grapheme context, the inserted phoneme is assigned to the grapheme on the right except when the insertion is at the end of the word and in that case assigned to the last letter and if the context was a phoneme context the phoneme is assigned to the grapheme which is halfway between the graphemes aligned to the neighboring phonemes and if the insertion is at one end of the ends of the words, the phoneme is assigned to the grapheme on the edge;
  
  storing the initial phoneme sequences for letters and the transformation rules; and
  
  generating word pronunciations by initially transcribing each letter and then applying the transformation rules in sequence.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Texas Instruments, Inc.
Original Assignee
Texas Instruments, Inc.
Inventors
Hemphill, Charles T., Molnar, Lajos
Primary Examiner(s)
Dorvil, Richemond

Application Number

US09/328,129
Time in Patent Office

1,113 Days
Field of Search

704/260, 704/254, 704/251, 704/258, 704/257, 704/255, 704/266, 704/231, 704/200, 704/252, 704/268
US Class Current

704/260
CPC Class Codes

G09B 19/04 Speaking with audible prese...

G10L 13/00 Speech synthesis; Text to s...

Rule-based learning of word pronunciations from training corpora

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Rule-based learning of word pronunciations from training corpora

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links