Accuracy of text-to-speech synthesis
First Claim
1. A method comprising:
- detecting, by at least one processor, occurrence of an out-of-vocabulary word in a text sample;
detecting a likelihood that the out-of-vocabulary word will be mispronounced using a primary text-to-speech synthesizer associated with a primary language;
receiving feedback from a source other than the primary text-to-speech synthesizer, the feedback indicating a conversion in accordance with a secondary language of the out-of-vocabulary word into a corresponding audio output;
storing the feedback in a repository;
generating, based on the feedback and by a secondary text-to-speech synthesizer associated with the secondary language, a first audio pronunciation of the out-of-vocabulary word pronounced in accordance with a native secondary language speaking person speaking the secondary language; and
generating, in accordance with a native primary language speaking person speaking the primary language, a second audio pronunciation of the out of vocabulary word.
4 Assignments
0 Petitions
Accused Products
Abstract
According to a first example configuration, a pair of text-to-speech synthesizers produces audio representations for each of multiple words. The outputs are compared to identify instances in which a lexicon lookup algorithm and a grapheme-to-phoneme algorithm produce different audio representations for the same words. Results of the analysis are used to train a classifier that subsequently determines a degree to which a grapheme-to-phoneme algorithm is likely to detect a newly detected out-of-vocabulary word to be converted into an audio representation. According to a second example configuration, a text analyzer tags a non-standard word. A group of reviewers generate one or more proposed text-to-speech expansion rules for a detected non-standard word. When there is a high amount of agreement amongst the reviewers how to expand the non-standard word, the proposed expansion rule is published for use by respective one or more text-to-speech synthesizers.
-
Citations
20 Claims
-
1. A method comprising:
-
detecting, by at least one processor, occurrence of an out-of-vocabulary word in a text sample; detecting a likelihood that the out-of-vocabulary word will be mispronounced using a primary text-to-speech synthesizer associated with a primary language; receiving feedback from a source other than the primary text-to-speech synthesizer, the feedback indicating a conversion in accordance with a secondary language of the out-of-vocabulary word into a corresponding audio output; storing the feedback in a repository; generating, based on the feedback and by a secondary text-to-speech synthesizer associated with the secondary language, a first audio pronunciation of the out-of-vocabulary word pronounced in accordance with a native secondary language speaking person speaking the secondary language; and generating, in accordance with a native primary language speaking person speaking the primary language, a second audio pronunciation of the out of vocabulary word. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method comprising:
-
implementing, by at least one processor, a lexicon lookup algorithm via first text-to-speech hardware to produce a first audio output for each word in a set of multiple words comprising one or more words from a base language and one or more words from a foreign language; implementing a grapheme-to-phoneme algorithm comprising one or more grapheme-to-phoneme rules via second text-to-speech hardware to produce a second audio output for each word in the set of multiple words; comparing the first audio output and the second audio output by analyzing instances in which the lexicon lookup algorithm produces a different audio output than the grapheme-to-phoneme algorithm for respective text; and generating a set of predictors based on the comparing, the set of predictors indicating circumstances in which use of the one or more grapheme-to-phoneme rules results in identifying one or more audio output representations that correspond to one or more words from the foreign language. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15)
-
-
16. A method comprising:
-
detecting, by at least one processor, occurrence of an out-of-vocabulary word in a text sample to be converted into audio output by detecting that the out-of-vocabulary word is not located in a lexicon associated with a default language; determining a probability that the out-of-vocabulary word will be mispronounced using a text-to-speech synthesizer; in response to the probability that the out-of-vocabulary word will be mispronounced being below a first threshold probability, producing, via a first text-to-speech synthesizer configured to generate audio in accordance with the default language, a first audio output of the entire out-of-vocabulary word and any words in the text sample that are located in the lexicon associated with the default language; and in response to the probability that the out-of-vocabulary word will be mispronounced meeting a second threshold probability, producing, via a second text-to-speech synthesizer configured to generate audio in accordance with a foreign language, a second audio output of the out-of-vocabulary word. - View Dependent Claims (17, 18, 19, 20)
-
Specification