ACCURACY OF TEXT-TO-SPEECH SYNTHESIS
First Claim
1. A method comprising:
- detecting occurrence of an out-of-vocabulary word in a text sample;
detecting a likelihood that the out-of-vocabulary word will be mispronounced using a primary text-to-speech synthesizer;
receiving feedback from a source other than the primary text-to-speech synthesizer, the feedback indicating a conversion of the out-of-vocabulary word into a corresponding audio representation; and
storing the feedback in a repository.
4 Assignments
0 Petitions
Accused Products
Abstract
According to a first example configuration, a pair of text-to-speech synthesizers produces audio representations for each of multiple words. The outputs are compared to identify instances in which a lexicon lookup algorithm and a grapheme-to-phoneme algorithm produce different audio representations for the same words. Results of the analysis are used to train a classifier that subsequently determines a degree to which a grapheme-to-phoneme algorithm is likely to detect a newly detected out-of-vocabulary word to be converted into an audio representation. According to a second example configuration, a text analyzer tags a non-standard word. A group of reviewers generate one or more proposed text-to-speech expansion rules for a detected non-standard word. When there is a high amount of agreement amongst the reviewers how to expand the non-standard word, the proposed expansion rule is published for use by respective one or more text-to-speech synthesizers.
-
Citations
35 Claims
-
1. A method comprising:
-
detecting occurrence of an out-of-vocabulary word in a text sample; detecting a likelihood that the out-of-vocabulary word will be mispronounced using a primary text-to-speech synthesizer; receiving feedback from a source other than the primary text-to-speech synthesizer, the feedback indicating a conversion of the out-of-vocabulary word into a corresponding audio representation; and storing the feedback in a repository. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method comprising:
-
implementing a lexicon lookup algorithm in first text-to-speech hardware to produce an audio output representation for each word in a set of multiple words; implementing a grapheme-to-phoneme algorithm in second text-to-speech hardware to produce an audio output representation for each word in the set of multiple words; for each word in the set;
performing a comparison of an audio output representation of the first text-to-speech hardware and an audio output representation of the second text-to-speech hardware; andclassifying each of the multiple words depending on the comparison. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15)
-
-
16-29. -29. (canceled)
-
30. A method comprising:
-
detecting occurrence of an out-of-vocabulary word in a text sample to be converted into audio output; estimating a probability that the out-of-vocabulary word will be mispronounced using a text-to-speech synthesizer; and selecting amongst multiple sources from which to produce an audio rendition of the out-of-vocabulary word based at least in part on a magnitude of the probability. - View Dependent Claims (31, 32, 33, 34)
-
-
35-37. -37. (canceled)
Specification