Method and apparatus for training a multilingual speech model set

US 6,912,499 B1
Filed: 08/31/1999
Issued: 06/28/2005
Est. Priority Date: 08/31/1999
Status: Expired due to Fees

First Claim

Patent Images

1. A method for generating a multilingual speech model set, said multilingual speech model set being suitable for use in a multilingual speech recognition system, said method comprising:

a) providing a group of acoustic sub-word units comprising;

a first subgroup of acoustic sub-word units associated to a first language with each acoustic sub-word unit associated in said first subgroup having an associated speech model;

a second subgroup of acoustic sub-word units associated to a second language;

said first subgroup and said second subgroup sharing at least one common acoustic sub-word unit;

b) providing a training set comprising a plurality of entries, each entry having a speech token representative of a word and a label being an orthographic representation of the word;

c) providing a set of untrained speech models, said set of untrained speech models having at least a first untrained speech model, further comprising, (i) providing said first untrained speech model by initializing at least one acoustic sub-word unit of said second subgroup with said associated speech model of at least one acoustic sub-word unit of said first subgroup that is acoustically similar to said at least one acoustic sub-word unit of said second subgroup;

d) training the set of untrained speech models by utilizing said training set, a plurality of letter to acoustic sub-word unit rules sets and said group of acoustic sub-word units to derive the multilingual speech model set, each letter to acoustic sub-word unit rules set being associated to a different language.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention relates to a method and apparatus for training a multilingual speech model set. The multilingual speech model set generated is suitable for use by a speech recognition system for recognizing spoken utterances for at least two different languages. The invention allows using a single speech recognition unit with a single speech model set to perform speech recognition on utterances from two or more languages. The method and apparatus make use of a group of a group of acoustic sub-word units comprised of a first subgroup of acoustic sub-word units associated to a first language and a second subgroup of acoustic sub-word units associated to a second language where the first subgroup and the second subgroup share at least one common acoustic sub-word unit. The method and apparatus also make use of a plurality of letter to acoustic sub-word unit rules sets, each letter to acoustic sub-word unit rules set being associated to a different language. A set of untrained speech models is trained on the basis of a training set comprising speech tokens and their associated labels in combination with the group of acoustic sub-word units and the plurality of letter to acoustic sub-word unit rules sets. The invention also provides a computer readable storage medium comprising a program element for implementing the method for training a multilingual speech model set.

336 Citations

24 Claims

1. A method for generating a multilingual speech model set, said multilingual speech model set being suitable for use in a multilingual speech recognition system, said method comprising:
- a) providing a group of acoustic sub-word units comprising;
  
  a first subgroup of acoustic sub-word units associated to a first language with each acoustic sub-word unit associated in said first subgroup having an associated speech model;
  
  a second subgroup of acoustic sub-word units associated to a second language;
  
  said first subgroup and said second subgroup sharing at least one common acoustic sub-word unit;
  
  b) providing a training set comprising a plurality of entries, each entry having a speech token representative of a word and a label being an orthographic representation of the word;
  
  c) providing a set of untrained speech models, said set of untrained speech models having at least a first untrained speech model, further comprising, (i) providing said first untrained speech model by initializing at least one acoustic sub-word unit of said second subgroup with said associated speech model of at least one acoustic sub-word unit of said first subgroup that is acoustically similar to said at least one acoustic sub-word unit of said second subgroup;
  
  d) training the set of untrained speech models by utilizing said training set, a plurality of letter to acoustic sub-word unit rules sets and said group of acoustic sub-word units to derive the multilingual speech model set, each letter to acoustic sub-word unit rules set being associated to a different language.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. A method as defined in claim 1, wherein the acoustic sub-word units comprise phonemes.
  - 3. A method as defined in claim 2, wherein training the set of untrained speech models comprises processing at least some of the entries in said training set on the basis of a certain letter to acoustic sub-word unit rules set to derive a group of transcriptions, the certain letter to acoustic sub-word unit rules set being associated to a certain language, the transcriptions in said group of transcriptions comprising a sequence of acoustic sub-word units of said group of acoustic sub-word units, the transcriptions in said group of transcriptions being associated to a certain language.
  - 4. A method as defined in claim 3, wherein training the set of untrained speech models further comprises:
    - associating the acoustic sub-word units in said group of acoustic sub-word units to respective speech models in said set of untrained speech models;
      
      processing the group of transcriptions on the basis of a speech token of the corresponding entry in said training set whereby training said set of untrained speech models to derive the multilingual speech model set.
  - 5. A method as defined in claim 2, wherein training the set of untrained speech models comprises processing the entries in said training set on the basis of a plurality of letter to acoustic sub-word unit rules set to derive a plurality of transcriptions, each transcription in said plurality of transcriptions comprising a sequence of acoustic sub-word units of said group of acoustic sub-word units, a first transcription in said plurality of transcriptions being associated to the first language, a second transcription in said plurality of transcriptions being associated to the second language.
  - 6. A method as defined in claim 5, wherein said transcriptions in the plurality of transcriptions are transcriptions of a first type, said method further comprising:
    - providing a universal question set that is language independent;
      
      processing said plurality of transcriptions of the first type on the basis of said universal question set to derive a plurality of transcriptions of a second type, the plurality of transcriptions of a second type being characterized as comprising context dependent acoustic sub-word units.
  - 7. A method as defined in claim 1, further comprising initializing at least some acoustic sub-word units in said second subgroup of acoustic sub-word units with respective speech models in said first subgroup to generate untrained speech models at least in part on the basis of a nearest sub-word unit method.
  - 8. A method as defined in claim 1, further comprising:
    - computing transformation weights for use in determining said at least one acoustic sub-word unit of said first subgroup that is acoustically similar to said at least one acoustic sub-word unit of said second subgroup.
  - 9. A method as defined in claim 1, further comprising:
    - comparing said at least one acoustic sub-word unit of said second subgroup with each acoustic sub-word unit of said first subgroup and computing transformation weights to for determining said at least one acoustic sub-word unit of said first subgroup that is acoustically similar to said at least one acoustic sub-word unit of said second subgroup.
  - 10. A computer readable storage medium comprising a data structure containing a multilingual speech model set generated by the method defined in claim 1.

11. An apparatus for generating a multilingual speech model set, said multilingual speech model set being suitable for use in a multilingual speech recognition system, said apparatus comprising:
- a) a first memory unit for storing acoustic data elements representative of a group of acoustic sub-word units comprising;
  
  a first subgroup of acoustic sub-word units associated to a first language with each acoustic sub-word unit having an associated acoustic data element;
  
  a second subgroup of acoustic sub-word units associated to a second language;
  
  said first subgroup and said second subgroup sharing at least one common acoustic sub-word unit;
  
  b) a second memory unit for storing a plurality of letter to acoustic sub-word unit rules sets, each letter to acoustic sub-word unit rules set being associated to a different language;
  
  c) a third memory unit suitable for storing a training set comprising a plurality of entries, each entry having a speech token representative of a word and a label being an orthographic representation of the word;
  
  d) a fourth memory unit for storing a set of untrained speech models, said set of untrained speech models comprising at least one untrained speech model, said one untrained speech model generated by initializing at least one acoustic sub-word unit of said second subgroup with said associated acoustic data element of at least one acoustic sub-word unit of said first subgroup that is acoustically similar to said at least one acoustic sub-word unit of said second subgroup;
  
  e) processing unit coupled to;
  
  said first memory unit;
  
  said second memory unit;
  
  said third memory unit;
  
  said fourth memory unit;
  
  said processing unit being operative for training the set of untrained speech models by utilizing said training set, said plurality of letter to acoustic sub-word unit rules sets and said group of acoustic sub-word units to derive the multilingual speech model set.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. An apparatus as defined in claim 11, wherein the acoustic sub-word units comprise phonemes.
  - 13. An apparatus as defined in claim 12, wherein said processing unit comprises an automatic transcription generator for processing at least some of the entries in said training set on the basis of a certain letter to acoustic sub-word unit rules set in said plurality of letter to acoustic sub-word unit rules sets to derive a group of transcriptions, the certain letter to acoustic sub-word unit rules set being associated to a certain language, the transcriptions in said group of transcriptions comprising a sequence of acoustic sub-word units of said group of acoustic sub-word units, the transcriptions in said group of transcriptions being associated to a certain language.
  - 14. An apparatus as defined in claim 12, wherein said automatic transcription generator is operative for processing the entries in said training set on the basis of said plurality of letter to acoustic sub-word unit rules sets to derive a plurality of transcriptions each transcription in said plurality of transcriptions comprising a sequence of acoustic sub-word units of said group of acoustic sub-word units, a first transcription in said plurality of transcriptions being associated to the first language, a second transcription in said plurality of transcriptions being associated to the second language.
  - 15. An apparatus as defined in claim 14, wherein said transcriptions in the plurality of transcriptions are transcriptions of a first type, said apparatus further comprising:
    - a fifth memory unit for storing a universal question set, said universal question set being language independent;
      
      said processing unit being further operative for processing said plurality of transcriptions of the first type on the basis of said universal question set to derive a plurality of transcriptions of a second type, the plurality of transcriptions of a second type being characterized as comprising context dependent acoustic sub-word units.
  - 16. An apparatus as defined in claim 13, wherein said processing unit further comprises:
    - a phoneme mapping unit for associating the acoustic sub-word units in said group of acoustic sub-word units to respective speech models in said set of untrained speech models;
      
      a model training unit for processing the group of transcriptions on the basis of a label of the corresponding entry in said training set whereby training said set of untrained speech models to derive the multilingual speech model set.
  - 17. An apparatus as defined in claim 13, further comprising a phoneme mapping unit for initializing at least some acoustic sub-word units in said second subgroup of acoustic sub-word units with respective speech models in said first subgroup to generate said untrained speech models at least in part on the basis of a nearest sub-word unit method.
  - 18. An apparatus as defined in claim 13, said apparatus comprising:
    - a processor;
      
      a computer readable storage medium coupled to said processor, said computer readable storage medium comprising a program element for execution by said processor for implementing said processing unit.
  - 19. An apparatus as defined in claim 11, wherein said processing unit being further operative for computing transformation weights for use in determining said at least one acoustic sub-word unit of said first subgroup that is acoustically similar to said at least one acoustic sub-word unit of said second subgroup.
  - 20. An apparatus as defined in claim 11, wherein said processing unit being further operative for comparing said at least one acoustic sub-word unit of said second subgroup with each acoustic sub-word unit of said first subgroup and computing transformation weights to for determining said at least one acoustic sub-word unit of said first subgroup that is acoustically similar to said at least one acoustic sub-word unit of said second subgroup.

21. A computer readable storage medium containing a program element suitable for use on a computer having a memory, said memory being suitable for storing:
- a) a group of acoustic sub-word units comprising;
  
  a first subgroup of acoustic sub-word units associated to a first language with each acoustic sub-word unit having an associated speech model;
  
  a second subgroup of acoustic sub-word units associated to a second language;
  
  said first subgroup and said second subgroup sharing at least one common acoustic sub-word unit;
  
  b) a plurality of letter to acoustic sub-word unit rules sets, each letter to acoustic sub-word unit rules set being associated to a different language;
  
  c) a training set comprising a plurality of entries, each entry having a speech token representative of a word and a label being an orthographic representation of the word;
  
  d) a set of untrained speech models, said set of untrained speech models comprising at least one untrained speech model, said one untrained speech model generated by initializing at least one acoustic sub-word unit of said second subgroup with said associated speech model of at least one acoustic sub-word unit of said first subgroup that is acoustically similar to said at least one acoustic sub-word unit of said second subgroup;
  
  said program element being operative for training the set of untrained speech models by utilizing said training set, said plurality of letter to acoustic sub-word unit rules sets and said group of acoustic sub-word units to derive a multilingual speech model set.
- View Dependent Claims (22)
- - 22. A computer readable storage medium as defined in claim 21, wherein said memory being further suitable for storing:
    - transformation weights for use in determining said at least one acoustic sub-word unit of said first subgroup that is acoustically similar to said at least one acoustic sub-word unit of said second subgroup.

23. An apparatus for generating a multilingual speech model set, said multilingual speech model set being suitable for use in a multilingual speech recognition system, said apparatus comprising:
- a) means for storing;
  
  i) acoustic data elements representative of a group of acoustic sub-word units comprising;
  
  a first subgroup of acoustic sub-word units associated to a first language with each acoustic sub-word unit having an associated acoustic data element;
  
  a second subgroup of acoustic sub-word units associated to a second language;
  
  said first subgroup and said second subgroup sharing at least one common acoustic sub-word unit;
  
  ii) a plurality of letter to acoustic sub-word unit rules sets, each letter to acoustic sub-word unit rules set being associated to a different language;
  
  iii) a training set comprising a plurality of entries, each entry having a speech token representative of a word and a label being an orthographic representation of the word;
  
  iv) a set of untrained speech models, said set of untrained speech models comprising at least one untrained speech model, said one untrained speech model generated by initializing at least one acoustic sub-word unit of said second subgroup with said associated acoustic data element of at least one acoustic sub-word unit of said first subgroup that is acoustically similar to said at least one acoustic sub-word unit of said second subgroup;
  
  means for training the set of untrained speech models by utilizing said training set, said plurality of letter to acoustic sub-word unit rules sets and said group of acoustic sub-word units to derive the multilingual speech model set.
- View Dependent Claims (24)
- - 24. An apparatus as defined in claim 23, further comprising:
    - means for computing transformation weights for use in determining said at least one acoustic sub-word unit of said first subgroup that is acoustically similar to said at least one acoustic sub-word unit of said second subgroup.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
RPX Clearinghouse LLC (RPX Corporation)
Original Assignee
Nortel Networks Limited (Nortel Networks Corporation)
Inventors
Robillard, Serge, Sabourin, Michael G.
Primary Examiner(s)
Azad, Abul K.

Application Number

US09/386,282
Time in Patent Office

2,128 Days
Field of Search

704/243, 704/244, 704/251, 704/252, 704/255, 704/256, 704/257, 704/8
US Class Current

704/243
CPC Class Codes

G10L 15/063 Training

G10L 15/187 Phonemic context, e.g. pron...

Method and apparatus for training a multilingual speech model set

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

336 Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for training a multilingual speech model set

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

336 Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links