Pronunciation correction of text-to-speech systems between different spoken languages
First Claim
1. A method of correcting pronunciation generation of a language pronunciation system, comprising:
- receiving a word according to an incoming language requiring electronic pronunciation according to a target language;
determining whether the word requiring electronic pronunciation is a word of the target language;
if the word requiring electronic pronunciation is not a word of the target language, retrieving a language locale for the word;
determining whether the language locale for the word matches a language locale for a pronunciation system responsible for converting the word to speech or recognizing a spoken form of the word;
generating a number of phoneme mapping tables, the number of phoneme mapping tables being governed by N=L2−
L, wherein N comprises the number of phoneme mapping tables and L comprises a number of the language locales between which translation is accomplished, each of the language locales comprising a country known to speak a foreign language;
if the language locale for the word does not match the language locale for a pronunciation system responsible for converting the word to speech or for recognizing an audible form of the word, mapping phonemes comprising the word according to the incoming language to corresponding phonemes associated with the target language, wherein mapping the phonemes comprises mapping at least one diphone from the incoming language to at least one diphone in the target language, the at least one diphone comprising two adjacent speech segments, the two adjacent speech segments comprising two adjacent letters in an actual spelling of the word according to the incoming language, wherein mapping the phonemes further comprises utilizing contextual data, the contextual data comprising at least one of;
at least one of a starting phoneme and a next phoneme before a subject phoneme in the incoming language word, wherein the at least one of the starting phoneme and the next phoneme contributes to the determination of a phoneme in the target language selected for mapping to the subject phoneme in the incoming language word; and
at least one of a starting phoneme and a next phoneme after a subject phoneme in the starting language word, wherein the at least one of the starting phoneme and the next phoneme contributes to the determination of a phoneme in the target language selected for mapping to the subject phoneme in the incoming language word; and
passing an output of the mapping of phonemes comprising the word according to the incoming language to corresponding phonemes associated with the target language to the pronunciation system for converting the word to speech or for recognizing an audible form of the word.
2 Assignments
0 Petitions
Accused Products
Abstract
Pronunciation correction for text-to-speech (TTS) systems and speech recognition (SR) systems between different languages is provided. If a word requiring pronunciation by a target language TTS or SR is from a same language as the target language, but is not found in a lexicon of words from the target language, a letter-to-speech (LTS) rules set of the target language is used to generate a letter-to-speech output for the word for use by the TTS or SR configured according to the target language. If the word is from a different language as the target language, phonemes comprising the word according to its native language are mapped to phonemes of the target language. The phoneme mapping is used by the TTS or SR configured according to the target language for generating or recognizing an audible form of the word according to the target language.
-
Citations
19 Claims
-
1. A method of correcting pronunciation generation of a language pronunciation system, comprising:
-
receiving a word according to an incoming language requiring electronic pronunciation according to a target language; determining whether the word requiring electronic pronunciation is a word of the target language; if the word requiring electronic pronunciation is not a word of the target language, retrieving a language locale for the word; determining whether the language locale for the word matches a language locale for a pronunciation system responsible for converting the word to speech or recognizing a spoken form of the word; generating a number of phoneme mapping tables, the number of phoneme mapping tables being governed by N=L2−
L, wherein N comprises the number of phoneme mapping tables and L comprises a number of the language locales between which translation is accomplished, each of the language locales comprising a country known to speak a foreign language;if the language locale for the word does not match the language locale for a pronunciation system responsible for converting the word to speech or for recognizing an audible form of the word, mapping phonemes comprising the word according to the incoming language to corresponding phonemes associated with the target language, wherein mapping the phonemes comprises mapping at least one diphone from the incoming language to at least one diphone in the target language, the at least one diphone comprising two adjacent speech segments, the two adjacent speech segments comprising two adjacent letters in an actual spelling of the word according to the incoming language, wherein mapping the phonemes further comprises utilizing contextual data, the contextual data comprising at least one of; at least one of a starting phoneme and a next phoneme before a subject phoneme in the incoming language word, wherein the at least one of the starting phoneme and the next phoneme contributes to the determination of a phoneme in the target language selected for mapping to the subject phoneme in the incoming language word; and at least one of a starting phoneme and a next phoneme after a subject phoneme in the starting language word, wherein the at least one of the starting phoneme and the next phoneme contributes to the determination of a phoneme in the target language selected for mapping to the subject phoneme in the incoming language word; and passing an output of the mapping of phonemes comprising the word according to the incoming language to corresponding phonemes associated with the target language to the pronunciation system for converting the word to speech or for recognizing an audible form of the word. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A tangible computer readable storage medium containing computer executable instructions which when executed by a computer perform a method of correcting pronunciation generation of a language pronunciation system, comprising:
- receiving a word according to an incoming language requiring electronic pronunciation according to a target language;
determining whether the word requiring electronic pronunciation is a word of the target language;
if the word requiring electronic pronunciation is not a word of the target language, retrieving language locale for the word;
determining whether a language locale for the word matches a language locale for a pronunciation system responsible for converting the word to speech or recognizing a spoken form of the word;
if a language locale for the word matches a language locale for a pronunciation system responsible for converting the word to speech or for recognizing an audible form of the word, applying a letter-to-speech (LTS) rules system associated with the target language to the word for generating an audible form of the word according to the LTS rules system;
passing an output of the application of the LTS rules associated with the target language to the word to the pronunciation system for converting the word to speech or for recognizing an audible form of the word;
generating a number of phoneme mapping tables, the phoneme mapping tables having dimensions m by n, where m is a number of phonemes in a source language and n is a number of phonemes in the target language;
if a language locale for the word does not match a language locale for a pronunciation system responsible for converting the word to speech or for recognizing an audible form of the word, mapping phonemes comprising the word according to the incoming language to corresponding phonemes associated with the target language; and
passing an output of the mapping of phonemes comprising the word according to the incoming language to corresponding phonemes associated with the target language to the pronunciation system for converting the word to speech or for recognizing an audible form of the word. - View Dependent Claims (10, 11)
- receiving a word according to an incoming language requiring electronic pronunciation according to a target language;
-
12. A tangible computer readable storage medium containing computer executable instructions which when executed by a computer perform a method of correcting pronunciation generation of a language pronunciation system, comprising:
- receiving a word according to an incoming language requiring electronic pronunciation according to a target language;
determining whether the word requiring electronic pronunciation is a word of the target language;
if the word requiring electronic pronunciation is not a word of the target language, retrieving language locale for the word;
determining whether a language locale for the word matches a language locale for a pronunciation system responsible for converting the word to speech or recognizing a spoken form of the word;
generating a number of phoneme mapping tables, the number of phoneme mapping tables being governed by N=L2−
L, wherein N comprises the number of phoneme mapping tables and L comprises a number of the language locales between which translation is accomplished, each of the language locales comprising a country known to speak a foreign language;
if a language locale for the word does not match a language locale for a pronunciation system responsible for converting the word to speech or for recognizing an audible form of the word, mapping phonemes comprising the word according to the incoming language to corresponding phonemes associated with the target language; and
passing an output of the mapping of phonemes comprising the word according to the incoming language to corresponding phonemes associated with the target language to the pronunciation system for converting the word to speech or for recognizing an audible form of the word. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
- receiving a word according to an incoming language requiring electronic pronunciation according to a target language;
Specification