STATE MAPPING FOR CROSS-LANGUAGE SPEAKER ADAPTATION
First Claim
1. One or more computer-readable storage media storing instructions for cross-language speaker adaptation in speech-to-speech language translation that when executed instruct a processor to perform acts comprising:
- sampling a source speaker'"'"'s voice in a speaker'"'"'s language (VSLS);
sampling an auxiliary speaker'"'"'s voice in the source speaker'"'"'s language (VALS);
sampling the auxiliary speaker'"'"'s voice in a listener'"'"'s language (VALL);
sampling a listener'"'"'s voice in the listener'"'"'s language (VLLL);
recognizing VSLS into text of the source speaker'"'"'s language (TLS);
translating the TLS to text of the listener'"'"'s language (TLL);
generating a Hidden Markov Model (HMM) model for the VALS;
mapping VSLS samples to VALS HMM states using context mapping;
generating a HMM model for the VALL;
mapping VALS HMM model states to VALL HMM model states, wherein the HMM states of the VALS model are mapped to the HMM states of the VALL model which are closest in an acoustic space using distortion measure mapping;
generating a HMM model for the VLLL;
mapping states of the VALL HMM model to states of the VLLL HMM model using context mapping; and
modifying VLLL using the VSLS samples to form a source speaker'"'"'s voice speaking the listener'"'"'s language (VOLL).
2 Assignments
0 Petitions
Accused Products
Abstract
Creation of sub-phonemic Hidden Markov Model (HMM) states and the mapping of those states results in improved cross-language speaker adaptation. The smaller sub-phonemic mapping provides improvements in usability and intelligibility particularly between languages with few common phonemes. HMM states of different languages may be mapped to one another using a distance between the HMM states in acoustic space. This distance may be calculated using Kullback-Leibler divergence and multi-space probability distribution. By combining distance mapping and context mapping for different speakers of the same language improved cross-language speaker adaptation is possible.
-
Citations
20 Claims
-
1. One or more computer-readable storage media storing instructions for cross-language speaker adaptation in speech-to-speech language translation that when executed instruct a processor to perform acts comprising:
-
sampling a source speaker'"'"'s voice in a speaker'"'"'s language (V S LS );sampling an auxiliary speaker'"'"'s voice in the source speaker'"'"'s language (V A LS );sampling the auxiliary speaker'"'"'s voice in a listener'"'"'s language (V A LL );sampling a listener'"'"'s voice in the listener'"'"'s language (V L LL );recognizing V S LS into text of the source speaker'"'"'s language (TLS );translating the T LS to text of the listener'"'"'s language (TLL );generating a Hidden Markov Model (HMM) model for the V A LS;
mapping V S LS samples to VA LS HMM states using context mapping;generating a HMM model for the V A LL;
mapping V A LS HMM model states to VA LL HMM model states, wherein the HMM states of the VA LS model are mapped to the HMM states of the VA LL model which are closest in an acoustic space using distortion measure mapping;generating a HMM model for the V L LL;
mapping states of the V A LL HMM model to states of the VL LL HMM model using context mapping; andmodifying V L LL using the VS LS samples to form a source speaker'"'"'s voice speaking the listener'"'"'s language (VO LL ). - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method comprising:
-
sampling first speech from a speaker in a first language (V A LS );decomposing the first speech into first speech sub-phoneme samples; generating a Hidden Markov Model (HMM) model of the V A LS comprising HMM states, wherein each state represents a distinctive sub-phonemic acoustic-phonetic event derived from the first speech sub-phoneme samples;training the first state model V A LS using the sub-phoneme samples;sampling second speech from the speaker in a second language (V A LL );decomposing the second speech into first speech sub-phoneme samples; generating a Hidden Markov Model (HMM) model of the V A LL comprising HMM states, wherein each state represents a distinctive sub-phonemic acoustic-phonetic event derived from the second speech sub-phoneme samples;training the second state model V A LL using the sub-phoneme samples; anddetermining corresponding states between V A LS HMM model states and VA LL HMM model states using Kullback-Leibler Divergence with multi-space probability distribution (KLD). - View Dependent Claims (8, 9, 10, 11, 12, 13, 14)
-
-
15. A system of speech-to-speech translation with cross-language speaker adaptation, the system comprising:
-
a processor; a memory coupled to the processor; a speaker adaptation module, stored in memory and configured to execute on the processor, the speaker adaptation module configured to map a first Hidden Markov Model (HMM) model of speech in a first language to a second HMM model of speech in a second language using Kullback-Leibler Divergence (KLD) with multi-space probability distribution (MSD). - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification