Multilingual speech recognition
First Claim
1. A computer-implemented method in which a computer system initiates execution of software instructions stored in memory, the computer-implemented method comprising:
- accepting text spellings of training words in a plurality of sets of training words, each set corresponding to a different one of a plurality of languages;
for each of the sets of training words in the plurality, receiving pronunciations for the training words in the set, the pronunciations being characteristic of native speakers of the language of the set, the pronunciations also being in terms of subword units at least some of which are common to two or more of the languages; and
training a single pronunciation estimator using data comprising the text spellings and the pronunciations of the training words; and
calculating a single acoustic subword model for each subword unit, based on the pronunciations in the plurality of sets of training words, by mixing distributions of acoustic parameters representing the sounds of the subword unit in multiple languages when a subword unit is common to two or more languages.
8 Assignments
0 Petitions
Accused Products
Abstract
A method for speech recognition. The method uses a single pronunciation estimator to train acoustic phoneme models and recognize utterances from multiple languages. The method includes accepting text spellings of training words in a plurality of sets of training words, each set corresponding to a different one of a plurality of languages. The method also includes, for each of the sets of training words in the plurality, receiving pronunciations for the training words in the set, the pronunciations being characteristic of native speakers of the language of the set, the pronunciations also being in terms of subword units at least some of which are common to two or more of the languages. The method also includes training a single pronunciation estimator using data comprising the text spellings and the pronunciations of the training words.
-
Citations
48 Claims
-
1. A computer-implemented method in which a computer system initiates execution of software instructions stored in memory, the computer-implemented method comprising:
-
accepting text spellings of training words in a plurality of sets of training words, each set corresponding to a different one of a plurality of languages; for each of the sets of training words in the plurality, receiving pronunciations for the training words in the set, the pronunciations being characteristic of native speakers of the language of the set, the pronunciations also being in terms of subword units at least some of which are common to two or more of the languages; and training a single pronunciation estimator using data comprising the text spellings and the pronunciations of the training words; and calculating a single acoustic subword model for each subword unit, based on the pronunciations in the plurality of sets of training words, by mixing distributions of acoustic parameters representing the sounds of the subword unit in multiple languages when a subword unit is common to two or more languages. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A computer-implemented method in which a computer system initiates execution of software instructions stored in memory for multilingual speech recognition, the computer-implemented method comprising:
-
accepting a recognition vocabulary that includes words from multiple languages; determining a pronunciation of each of the words in the recognition vocabulary using a pronunciation estimator that is common to the multiple languages; determining an acoustic word model for each of the words in the recognition vocabulary by mapping subword units in the estimated pronunciation to acoustic subword models, at least some of which comprise a mix of distributions of acoustic parameters representing the sounds of the subword unit in multiple languages, and combining the acoustic subword models; and configuring a speech recognizer using the determined acoustic word models of the words in the recognition vocabulary. - View Dependent Claims (20)
-
-
21. A computer program product, tangibly embodied in a storage medium, the computer program product being operable to cause data processing apparatus to:
-
accept text spellings of training words in a plurality of sets of training words, each set corresponding to a different one of a plurality of languages; for each of the sets of training words in the plurality, receive pronunciations for the training words in the set, the pronunciations being characteristic of native speakers of the language of the set, the pronunciations also being in terms of subword units at least some of which are common to two or more of the languages; train a pronunciation estimator using data comprising the text spellings and the pronunciations of the training words; and calculating a single acoustic subword model for each subword unit, based on the pronunciations in the plurality of sets of training words, by mixing distributions of acoustic parameters representing the sounds of the subword unit in multiple languages when a subword unit is common to two or more languages. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37)
-
-
38. A computer program product, tangibly embodied in a storage medium, for recognizing words spoken by native speakers of multiple languages, the computer program product being operable to cause data processing apparatus to:
-
generate a set of estimated pronunciations, using a single pronunciation estimator, from text spellings of a set of acoustic training words, each pronunciation comprising a grouping of subword units, the set of acoustic training words comprising at least a first word and a second word, the first and second words having identical text spelling, the first word having a pronunciation based on utterances of native speakers of a first language, the second word having a pronunciation based on utterances of native speakers of a second language; map sequences of sound associated with utterances of each of the acoustic training words against the estimated pronunciation associated with each of the acoustic training words; and use the mapping of sequences of sound to estimated pronunciations to generate a single acoustic subword model for each of the subword units in the grouping of subwords, by mixing distributions of acoustic parameters representing the sounds of the subword unit in multiple languages when a subword model comprising a sound model and a subword unit.
-
-
39. A computer program product, tangibly embodied in a storage medium, for multilingual speech recognition, the computer program product being operable to cause data processing apparatus to:
-
accept a recognition vocabulary that includes words from multiple languages; determine a pronunciation of each of the words in the recognition vocabulary using a pronunciation estimator that is common to the multiple languages; determining an acoustic word model for each of the words in the recognition vocabulary by mapping subword units in the estimated pronunciation to acoustic subword models, at least some of which comprise a mix of distributions of acoustic parameters representing the sounds of the subword unit in multiple languages, and combining the acoustic subword models; and configure a speech recognizer using the determined acoustic word models of the words in the recognition vocabulary. - View Dependent Claims (40)
-
-
41. A computer system comprising:
-
a processor; a memory coupled to the processor, the memory storing instructions that when executed by the processor cause the system to perform the operations of; accepting text spellings of training words in a plurality of sets of training words, each set corresponding to a different one of a plurality of languages; receiving, for each of the sets of training words in the plurality, pronunciations for the training words in the set, the pronunciations being characteristic of native speakers of the language of the set, the pronunciations also being in terms of subword units at least some of which are common to two or more of the languages; training a single pronunciation estimator using data comprising the text spellings and the pronunciations of the training words; and calculating a single acoustic subword model for each subword unit, based on pronunciations in the plurality of sets of training words, by fixing distributions of acoustic parameters representing the sounds of the subword unit in multiple languages when a subword unit is common to two or more languages. - View Dependent Claims (42, 43, 44)
-
-
45. A computer system for recognizing words spoken by native speakers of multiple languages, the computer system comprising:
-
a processor; a memory coupled to the processor, the memory storing instructions that when executed by the processor cause the system to perform the operations of; generating a set of estimated pronunciations, using a pronunciation estimator, from text spellings of a set of acoustic training words, each pronunciation comprising a grouping of subword units, the set of acoustic training words comprising at least a first word and a second word, the first and second words having identical text spelling, the first word having a pronunciation based on utterances of native speakers of a first language, the second word having a pronunciation based on utterances of native speakers of a second language; mapping sequences of sound associated with utterances of each of the acoustic training words against the estimated pronunciation associated with each of the acoustic training words; and using the mapping of sequences of sound to estimated pronunciations to generate a single acoustic subword model for each of the subword units in the grouping of subwords, by mixing distributions of acoustic parameters representing the sounds of the subword unit in multiple languages when a subword unit is common to two or more languages, the acoustic subword model comprising a sound model and a subword unit.
-
-
46. A computer system for multilingual speech recognition, the computer system comprising:
-
a processor; a memory coupled to the processor, the memory storing instructions that when executed by the processor cause the system to perform the operations of; accepting a recognition vocabulary that includes words from multiple languages; determining a pronunciation of each of the words in the recognition vocabulary using a pronunciation estimator that is common to the multiple languages; determining a pronunciation of each of the words in the recognition vocabulary using a pronunciation estimator that is common to the multiple languages; determining an acoustic word model for each of the words in the recognition vocabulary by mapping subword units in the estimated pronunciation to acoustic subword models, at least some of which comprise a mix of distributions of acoustic parameters representing the sounds of the subword unit in multiple languages, and combining the acoustic subword models; and configuring a speech recognizer using the determined acoustic words models of the words in the recognition vocabulary.
-
-
47. A computer-implemented method in which a computer system initiates execution of software instructions stored in memory for recognizing words spoken by native speakers of multiple languages, the computer-implemented method comprising:
-
generating a set of estimated pronunciations, using a single pronunciation estimator, from text spellings of a set of acoustic training words, each pronunciation comprising a grouping of subword units, the set of acoustic training words comprising at least a first word and a second word, the first and second words having identical text spelling, the first word having a pronunciation based on utterances of native speakers of a first language, the second word having a pronunciation based on utterances of native speakers of a second language; mapping sequences of sound associated with utterances of each of the acoustic training words against the estimated pronunciation associated with each of the acoustic training words; and using the mapping of sequences of sound to estimated pronunciations to generate a single acoustic subword model for each of the subword units in the grouping of subwords, by mixing distributions of acoustic parameters representing the sounds of the subword unit in multiple languages when a subword unit is common to two or more languages, the acoustic subword model comprising a sound model and a subword unit.
-
-
48. A computer-implemented method in which a computer system initiates execution of software instructions stored in memory, the computer-implemented method comprising:
-
accepting text spellings of training words in a plurality of sets of training words, each set corresponding to a different one of a plurality of languages; for each of the sets of training words in the plurality, receiving pronunciations for the training words in the set, the pronunciations being characteristic of native speakers of the language of the set, the pronunciations also being in terms of subword units at least some of which are common to two or more of the languages; training a pronunciation estimator using data comprising the text spellings and the pronunciations of the training words; and calculating an acoustic subword model for each subword unit, based on the pronunciations in the plurality of sets of training words, by mixing distributions of acoustic parameters from multiple languages when a subword unit is common to two or more languages, wherein an acoustic subword model for a subword unit that is common to two or more languages comprises a probability distribution that is a weighted blend of probability distributions each corresponding to a different sound associated with the subword unit.
-
Specification