Multilingual speech recognition system using text derived recognition models
First Claim
1. A method of speech recognition in order to identify a speech command as a match to a written text command comprising the steps:
- providing a text input from a text database;
receiving an acoustic input;
generating sequences of multilingual phoneme symbols based on said text input by means of a multilingual text-to-phoneme module;
generating variations of pronunciations which are recognizable in response to said sequences of multilingual phoneme symbols determined by use of a branched grammar; and
comparing said variations of pronunciations with the acoustic input in order to find a match.
2 Assignments
0 Petitions
Accused Products
Abstract
There is provided a novel approach for generating multilingual text-to-phoneme mappings for use in multilingual speech recognition systems. The multilingual mappings are based on the weighted output from a neural network text-to-phoneme model, trained on data mixed from several languages. The multilingual mappings used together with a branched grammar decoding scheme is able to capture both inter- and intra-language pronunciation variations which is ideal for multilingual speaker independent recognition systems. A significant improvement in overall system performance is obtained for a multilingual speaker independent name dialing task when applying multilingual instead of language dependent text-to-phoneme mapping.
57 Citations
25 Claims
-
1. A method of speech recognition in order to identify a speech command as a match to a written text command comprising the steps:
-
providing a text input from a text database; receiving an acoustic input; generating sequences of multilingual phoneme symbols based on said text input by means of a multilingual text-to-phoneme module; generating variations of pronunciations which are recognizable in response to said sequences of multilingual phoneme symbols determined by use of a branched grammar; and comparing said variations of pronunciations with the acoustic input in order to find a match. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for speech recognition comprising:
-
a text database for providing a text input; transducer means for receiving an acoustic input; a multilingual text-to-phoneme module for outputting sequences of multilingual phoneme symbols based on said text input; a pronunciation lexicon module receiving said sequences of multilingual phoneme symbols from said multilingual text-to-phoneme module, and for generating variations of pronunciations which are recognizable in response thereto which are determined by a branched grammar; and a multilingual recognizer based on multilingual acoustic phoneme models for comparing said variations of pronunciations generated by the pronunciation lexicon module with the acoustic input in order to find a match. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A communication terminal including a speech recognition unit comprising:
-
a text database for providing a text input; transducer means for receiving an acoustic input; a multilingual text-to-phoneme module for outputting sequences of multilingual phoneme symbols based on said text input; a pronunciation lexicon module receiving said sequences of multilingual phoneme symbols from said multilingual text-to phoneme module, and for generating variations of pronunciations in response thereto which are determined by a branched grammar; and a multilingual recognizer based on multilingual acoustic phoneme models for comparing said variations of pronunciations generated by the pronunciation lexicon module with the acoustic input in order to find a match. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25)
-
Specification