Method and apparatus for generating acoustic models for speaker independent speech recognition of foreign words uttered by non-native speakers
First Claim
1. A method for automatically generating acoustic models for speech recognition, comprising the steps of:
- utilizing stored training material, which includes recorded utterances in a first language, in conjunction with trained acoustic models of the first language;
deriving a time alignment between analysis frames, for the recorded utterances, and phonemes of the first language utilizing a phoneme-based training algorithm;
deriving associations between the analysis frames and phonemes of a second language utilizing a second language phoneme recognizer on the recorded utterances in the first language;
retrieving the associations between the analysis frames and individual phoneme symbols for both the first language and the second language, wherein such retrieved associations are stored;
statistically analyzing the associations to determine a best phoneme to phoneme (P2P) mapping between the first and second language phonetic inventories;
transcribing all words in the first language phonetic inventory, utilizing the best P2P mapping, utilizing phonemes from the second language phonetic inventory; and
generating a new phonetic inventory utilizing the transcribed words of the first language phonetic inventory.
6 Assignments
0 Petitions
Accused Products
Abstract
Acoustic models for speech recognition are automatically generated utilizing trained acoustic models from a native language and a foreign language. A phoneme-to-phoneme mapping is utilized to enable the description of foreign language words with native language phonemes. The phoneme-to-phoneme mapping is used for training foreign language words, described by native language phonemes on foreign language speech material. A new phonetic lexicon is created containing foreign language words and native language words transcribed by native language phonemes. Robust native language acoustic models can be derived utilizing foreign language and native language training material. The mapping may be used for training a grapheme to phoneme transducer (i.e., foreign language to native language) to generate native language pronunciations for new foreign language words.
-
Citations
20 Claims
-
1. A method for automatically generating acoustic models for speech recognition, comprising the steps of:
-
utilizing stored training material, which includes recorded utterances in a first language, in conjunction with trained acoustic models of the first language; deriving a time alignment between analysis frames, for the recorded utterances, and phonemes of the first language utilizing a phoneme-based training algorithm; deriving associations between the analysis frames and phonemes of a second language utilizing a second language phoneme recognizer on the recorded utterances in the first language; retrieving the associations between the analysis frames and individual phoneme symbols for both the first language and the second language, wherein such retrieved associations are stored; statistically analyzing the associations to determine a best phoneme to phoneme (P2P) mapping between the first and second language phonetic inventories; transcribing all words in the first language phonetic inventory, utilizing the best P2P mapping, utilizing phonemes from the second language phonetic inventory; and generating a new phonetic inventory utilizing the transcribed words of the first language phonetic inventory. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. An apparatus for automatically generating acoustic models for speech recognition, comprising:
-
a phonetic inventory of a first language; a phonetic inventory of a second language; a controller for; utilizing stored training material, which includes recorded utterances in a first language, in conjunction with trained acoustic models of the first language; deriving a time alignment between recorded utterance analysis frames and phonemes of the first language utilizing a phoneme-based training algorithm; deriving associations between the analysis frames and phonemes of a second language utilizing a second language phoneme recognizer on the recorded utterances in the first language; retrieving the associations between the analysis frames and individual phonemes for both the first language and the second language, wherein such retrieved associations are stored; statistically analyzing corresponding associations between the analysis frames to determine a best phoneme to phoneme (P2P) mapping between the first and second language phonetic inventories; transcribing all words in the first language phonetic inventory, utilizing the best P2P mapping, with phonemes from the second language phonetic inventory; and generating a new phonetic inventory utilizing the transcribed words of the first language phonetic inventory. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification