Multilingual speech recognition
First Claim
1. A method comprising:
- accepting text spellings of training words in a plurality of sets of training words, each set corresponding to a different one of a plurality of languages;
for each of the sets of training words in the plurality, receiving pronunciations for the training words in the set, the pronunciations being characteristic of native speakers of the language of the set, the pronunciations also being in terms of subword units at least some of which are common to two or more of the languages; and
training a single pronunciation estimator using data comprising the text spellings and the pronunciations of the training words.
8 Assignments
0 Petitions
Accused Products
Abstract
A method for speech recognition. The method uses a single pronunciation estimator to train acoustic phoneme models and recognize utterances from multiple languages. The method includes accepting text spellings of training words in a plurality of sets of training words, each set corresponding to a different one of a plurality of languages. The method also includes, for each of the sets of training words in the plurality, receiving pronunciations for the training words in the set, the pronunciations being characteristic of native speakers of the language of the set, the pronunciations also being in terms of subword units at least some of which are common to two or more of the languages. The method also includes training a single pronunciation estimator using data comprising the text spellings and the pronunciations of the training words.
-
Citations
46 Claims
-
1. A method comprising:
-
accepting text spellings of training words in a plurality of sets of training words, each set corresponding to a different one of a plurality of languages;
for each of the sets of training words in the plurality, receiving pronunciations for the training words in the set, the pronunciations being characteristic of native speakers of the language of the set, the pronunciations also being in terms of subword units at least some of which are common to two or more of the languages; and
training a single pronunciation estimator using data comprising the text spellings and the pronunciations of the training words. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A method for recognizing words spoken by native speakers of multiple languages, the method comprising:
-
generating a set of estimated pronunciations, using a single pronunciation estimator, from text spellings of a set of acoustic training words, each pronunciation comprising a grouping of subword units, the set of acoustic training words comprising at least a first word and a second word, the first and second words having identical text spelling, the first word having a pronunciation based on utterances of native speakers of a first language, the second word having a pronunciation based on utterances of native speakers of a second language;
mapping sequences of sound associated with utterances of each of the acoustic training words against the estimated pronunciation associated with each of the acoustic training words; and
using the mapping of sequences of sound to estimated pronunciations to generate acoustic subword models for the subword units in the grouping of subwords, the acoustic subword model comprising a sound model and a subword unit.
-
-
19. A method for multilingual speech recognition comprising:
-
accepting a recognition vocabulary that includes words from multiple languages;
determining a pronunciation of each of the words in the recognition vocabulary using a pronunciation estimator that is common to the multiple languages; and
configuring a speech recognizer using the determined pronunciations of the words in the recognition vocabulary. - View Dependent Claims (20)
-
-
21. A computer program product, tangibly embodied in an information carrier, the computer program product being operable to cause data processing apparatus to:
-
accept text spellings of training words in a plurality of sets of training words, each set corresponding to a different one of a plurality of languages;
for each of the sets of training words in the plurality, receive pronunciations for the training words in the set, the pronunciations being characteristic of native speakers of the language of the set, the pronunciations also being in terms of subword units at least some of which are common to two or more of the languages; and
train a single pronunciation estimator using data comprising the text spellings and the pronunciations of the training words. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37)
-
-
38. A computer program product for recognizing words spoken by native speakers of multiple languages, the computer program product being operable to cause data processing apparatus to:
-
generate a set of estimated pronunciations, using a single pronunciation estimator, from text spellings of a set of acoustic training words, each pronunciation comprising a grouping of subword units, the set of acoustic training words comprising at least a first word and a second word, the first and second words having identical text spelling, the first word having a pronunciation based on utterances of native speakers of a first language, the second word having a pronunciation based on utterances of native speakers of a second language;
map sequences of sound associated with utterances of each of the acoustic training words against the estimated pronunciation associated with each of the acoustic training words; and
use the mapping of sequences of sound to estimated pronunciations to generate acoustic subword models for the subword units in the grouping of subwords, the acoustic subword model comprising a sound model and a subword unit.
-
-
39. A computer program product for multilingual speech recognition, the computer program product being operable to cause data processing apparatus to:
-
accept a recognition vocabulary that includes words from multiple languages;
determine a pronunciation of each of the words in the recognition vocabulary using a pronunciation estimator that is common to the multiple languages; and
configure a speech recognizer using the determined pronunciations of the words in the recognition vocabulary.
-
-
40. The computer program product of claim 40, the computer program product being further operable to cause data processing apparatus to:
-
accept a training vocabulary that comprises words from multiple languages;
determine a pronunciation of each of the words in the training vocabulary using the pronunciation estimator that is common to the multiple languages;
configure the speech recognizer using parameters estimated using the determined pronunciations of the words in the training vocabulary; and
recognize utterances using the configured speech recognizer.
-
-
41. An apparatus comprising:
-
means for accepting text spellings of training words in a plurality of sets of training words, each set corresponding to a different one of a plurality of languages;
means for receiving, for each of the sets of training words in the plurality, pronunciations for the training words in the set, the pronunciations being characteristic of native speakers of the language of the set, the pronunciations also being in terms of subword units at least some of which are common to two or more of the languages; and
means for training a single pronunciation estimator using data comprising the text spellings and the pronunciations of the training words. - View Dependent Claims (42, 43, 44)
-
-
45. An apparatus for recognizing words spoken by native speakers of multiple languages, the apparatus comprising:
-
a means for generating a set of estimated pronunciations, using a single pronunciation estimator, from text spellings of a set of acoustic training words, each pronunciation comprising a grouping of subword units, the set of acoustic training words comprising at least a first word and a second word, the first and second words having identical text spelling, the first word having a pronunciation based on utterances of native speakers of a first language, the second word having a pronunciation based on utterances of native speakers of a second language;
means for mapping sequences of sound associated with utterances of each of the acoustic training words against the estimated pronunciation associated with each of the acoustic training words; and
means for using the mapping of sequences of sound to estimated pronunciations to generate acoustic subword models for the subword units in the grouping of subwords, the acoustic subword model comprising a sound model and a subword unit.
-
-
46. An apparatus for multilingual speech recognition, the apparatus comprising:
-
means for accepting a recognition vocabulary that includes words from multiple languages;
means for determining a pronunciation of each of the words in the recognition vocabulary using a pronunciation estimator that is common to the multiple languages; and
means for configuring a speech recognizer using the determined pronunciations of the words in the recognition vocabulary.
-
Specification