Systems and methods for building a native language phoneme lexicon having native pronunciations of non-native words derived from non-native pronunciatons
First Claim
Patent Images
1. A system for generating base forms for a non-native language in a speech-based system trained for processing a native language, the system comprising:
- a text processing system configured to receive input textual data containing both native language and non-native language words, the text processing system configured to identify the native language and non-native language words within the textual data, to generate a native phonetic transcription of the native language words using phonetic units of the native language, and to generate a non-native phonetic transcription of the non-native language words using phonetic units of the non-native language;
a pronunciation generator configured to generate a native pronunciation of the non-native language words using phonetic units of the native language by mapping the phonetic units of the non-native phonetic transcription to acoustically similar phonetic units of the native language; and
a memory configured to store the input textual data with the corresponding native phonetic transcription of the native language words and the mapped native pronunciation of the non-native language words in a native phonetic lexicon.
3 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods are provided for automatically building a native phonetic lexicon for a speech-based application trained to process a native (base) language, wherein the native phonetic lexicon includes native phonetic transcriptions (base forms) for non-native (foreign) words which are automatically derived from non-native phonetic transcriptions of the non-native words.
50 Citations
24 Claims
-
1. A system for generating base forms for a non-native language in a speech-based system trained for processing a native language, the system comprising:
-
a text processing system configured to receive input textual data containing both native language and non-native language words, the text processing system configured to identify the native language and non-native language words within the textual data, to generate a native phonetic transcription of the native language words using phonetic units of the native language, and to generate a non-native phonetic transcription of the non-native language words using phonetic units of the non-native language; a pronunciation generator configured to generate a native pronunciation of the non-native language words using phonetic units of the native language by mapping the phonetic units of the non-native phonetic transcription to acoustically similar phonetic units of the native language; and a memory configured to store the input textual data with the corresponding native phonetic transcription of the native language words and the mapped native pronunciation of the non-native language words in a native phonetic lexicon. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method for generating base forms for non-native language in a speech-based system for processing a native language, the method comprising acts performed by at least one processor, of:
-
receiving input textual data containing both native language and non-native language words; identifying the native language and non-native language words within the textual data; tagging the native language words with a tag indicating that the words belong to the native language and tagging the non-native language words with a tag indicating that the words belong to the non-native language; generating, by the at least one processor, a native phonetic transcription of the native language words using phonetic units of the native language; generating a non-native phonetic transcription of the non-native language words using phonetic units of the non-native language; generating a native pronunciation of the non-native language words using phonetic units of the native language by mapping the phonetic units of the non-native phonetic transcription to acoustically similar phonetic units of the native language; and storing the input textual data with the corresponding native phonetic transcription of the native language words and the mapped native pronunciation of the non-native language words in a native phonetic lexicon. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. At least one program storage device having encoded thereon executable program code that, when executed by at least one processor, performs a method for generating base forms for non-native language in a speech-based system for processing a native language, the method comprising acts of:
-
receiving input textual data containing both native language and non-native language words; identifying the native language and non-native language words within the textual data; generating a native phonetic transcription of the native language words using phonetic units of the native language; generating a non-native phonetic transcription of the non-native language words using phonetic units of the non-native language; generating a native pronunciation of the non-native language words using phonetic units of the native language by mapping the phonetic units of the non-native phonetic transcription to acoustically similar phonetic units of the native language; and storing the input textual data with the corresponding native phonetic transcription of the native language words and the mapped native pronunciation of the non-native language words in a native phonetic lexicon. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
-
Specification