Multilingual speech recognition system using text derived recognition models

US 7,043,431 B2
Filed: 08/31/2001
Issued: 05/09/2006
Est. Priority Date: 08/31/2001
Status: Active Grant

First Claim

Patent Images

1. A method of speech recognition in order to identify a speech command as a match to a written text command comprising the steps:

providing a text input from a text database;

receiving an acoustic input;

generating sequences of multilingual phoneme symbols based on said text input by means of a multilingual text-to-phoneme module;

generating variations of pronunciations which are recognizable in response to said sequences of multilingual phoneme symbols determined by use of a branched grammar; and

comparing said variations of pronunciations with the acoustic input in order to find a match.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

There is provided a novel approach for generating multilingual text-to-phoneme mappings for use in multilingual speech recognition systems. The multilingual mappings are based on the weighted output from a neural network text-to-phoneme model, trained on data mixed from several languages. The multilingual mappings used together with a branched grammar decoding scheme is able to capture both inter- and intra-language pronunciation variations which is ideal for multilingual speaker independent recognition systems. A significant improvement in overall system performance is obtained for a multilingual speaker independent name dialing task when applying multilingual instead of language dependent text-to-phoneme mapping.

57 Citations

View as Search Results

25 Claims

1. A method of speech recognition in order to identify a speech command as a match to a written text command comprising the steps:
- providing a text input from a text database;
  
  receiving an acoustic input;
  
  generating sequences of multilingual phoneme symbols based on said text input by means of a multilingual text-to-phoneme module;
  
  generating variations of pronunciations which are recognizable in response to said sequences of multilingual phoneme symbols determined by use of a branched grammar; and
  
  comparing said variations of pronunciations with the acoustic input in order to find a match.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. A method according to claim 1 wherein the text input is processed letter by letter, and wherein a neural network provides an estimate of the posterior probabilities of the different phonemes for each letter.
  - 3. A method according to claim 1 comprising deriving said text input from a database containing user entered text strings.
  - 4. A method according to claim 1, wherein generating variations of pronunciations determined by use of a branched grammar comprises use of a weighted branched grammar in which the weightings are representative of probabilities of the phonemes of said sequences of multilingual phoneme symbols.
  - 5. A method according to claim 1, wherein said sequences of multilingual phoneme symbols comprise a complete and non-redundant set of multilingual phoneme symbol sequences for languages supported by said multilingual text-to-phoneme module.
  - 6. A method according to claim 1, wherein generating sequences of multilingual phoneme symbols comprises generating a weighted branched grammar in which the weightings are representative of probabilities of the phonemes of said sequences of multilingual phoneme symbols.
  - 7. A method according to claim 1, wherein generating variations of pronunciations determined by use of a branched grammar comprises capturing intra-language and inter-language pronunciation variations of said text input in said branched grammar of said sequences of multilingual phoneme symbols.

8. A system for speech recognition comprising:
- a text database for providing a text input;
  
  transducer means for receiving an acoustic input;
  
  a multilingual text-to-phoneme module for outputting sequences of multilingual phoneme symbols based on said text input;
  
  a pronunciation lexicon module receiving said sequences of multilingual phoneme symbols from said multilingual text-to-phoneme module, and for generating variations of pronunciations which are recognizable in response thereto which are determined by a branched grammar; and
  
  a multilingual recognizer based on multilingual acoustic phoneme models for comparing said variations of pronunciations generated by the pronunciation lexicon module with the acoustic input in order to find a match.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16)
- - 9. A system according to claim 8, wherein the multilingual text-to phoneme module processes said text input letter by letter, and comprises a neural network for giving an estimate of the posterior probabilities of the different phonemes for each letter.
  - 10. A system according to claim 9 wherein the neural network is a standard fully connected feed-forward multi-layer perceptron neural network.
  - 11. A system according to claim 8 wherein the text input is derived from a database containing user entered text strings.
  - 12. A system according to claim 11 wherein the database containing user entered text strings is an electronic phonebook including phone numbers and associated name labels.
  - 13. A system according to claim 8, wherein said pronunciation lexicon module uses a weighted branched grammar for determining variations of pronunciations in which the weightings are representative of probabilities of the phonemes of said sequences of multilingual phoneme symbols.
  - 14. A system according to claim 8, wherein said multilingual text-to-phoneme module generates said sequences of multilingual phoneme symbols to correspond to a complete and non-redundant set of multilingual phoneme symbol sequences for languages supported by said multilingual text-to-phoneme module.
  - 15. A system according to claim 8, wherein said multilingual text-to-phoneme module outputs the sequences of multilingual phoneme symbols in a weighted branched grammar in which the weightings are representative of probabilities of the phonemes of said sequences of multilingual phoneme symbols.
  - 16. A system according to claim 8, wherein said pronunciation lexicon module captures intra-language and inter-language pronunciation variations of said text input in said branched grammar of said sequences of multilingual phoneme symbols.

17. A communication terminal including a speech recognition unit comprising:
- a text database for providing a text input;
  
  transducer means for receiving an acoustic input;
  
  a multilingual text-to-phoneme module for outputting sequences of multilingual phoneme symbols based on said text input;
  
  a pronunciation lexicon module receiving said sequences of multilingual phoneme symbols from said multilingual text-to phoneme module, and for generating variations of pronunciations in response thereto which are determined by a branched grammar; and
  
  a multilingual recognizer based on multilingual acoustic phoneme models for comparing said variations of pronunciations generated by the pronunciation lexicon module with the acoustic input in order to find a match.
- View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25)
- - 18. A communication terminal according to claim 17, wherein the multilingual text-to phoneme module processes said text input letter by letter, and comprises a neural network for giving an estimate of the posterior probabilities of the different phonemes for each letter.
  - 19. A communication terminal according to claim 18 wherein the neural network is a standard fully connected feed-forward multi-layer perceptron neural network.
  - 20. A communication terminal according to claim 17 wherein the text input is derived from a database containing user entered text strings.
  - 21. A communication terminal according to claim 20 wherein the database containing user entered text strings is an electronic phonebook including phone numbers and associated name labels.
  - 22. A communication terminal according to claim 17, wherein said pronunciation lexicon module uses a weighted branched grammar for determining variations of pronunciations in which the weightings are representative of probabilities of the phonemes of said sequences of multilingual phoneme symbols.
  - 23. A communication terminal according to claim 17, wherein said multilingual text-to-phoneme module generates said sequences of multilingual phoneme symbols to correspond to a complete and non-redundant set of multilingual phoneme symbol sequences for languages supported by said multilingual text-to-phoneme module.
  - 24. A communication terminal according to claim 17, wherein said multilingual text-to-phoneme module outputs the sequences of multilingual phoneme symbols in a weighted branched grammar in which the weightings are representative of probabilities of the phonemes of said sequences of multilingual phoneme symbols.
  - 25. A communication terminal according to claim 17, wherein said pronunciation lexicon module captures intra-language and inter-language pronunciation variations of said text input in said branched grammar of said sequences of multilingual phoneme symbols.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nokia Technologies Oy (Nokia Corporation)
Original Assignee
Nokia Corporation
Inventors
Pedersen, Morten With, Riis, Søren, Jensen, Kåre Jean
Primary Examiner(s)
McFadden, Susan
Assistant Examiner(s)
ALBERTALLI, BRIAN LOUIS

Application Number

US09/942,609
Publication Number

US 20030050779A1
Time in Patent Office

1,712 Days
Field of Search

704/259, 704/231, 704/232, 704/254, 704/256, 704/243, 704/260
US Class Current

704/259
CPC Class Codes

G10L 13/08   Text analysis or generation...

G10L 15/144   Training of HMMs

G10L 25/30   using neural networks

Multilingual speech recognition system using text derived recognition models

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

57 Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Multilingual speech recognition system using text derived recognition models

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

57 Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links