Method and apparatus for generating acoustic models for speaker independent speech recognition of foreign words uttered by non-native speakers

US 7,415,411 B2
Filed: 03/04/2004
Issued: 08/19/2008
Est. Priority Date: 03/04/2004
Status: Expired due to Fees

First Claim

Patent Images

1. A method for automatically generating acoustic models for speech recognition, comprising the steps of:

utilizing stored training material, which includes recorded utterances in a first language, in conjunction with trained acoustic models of the first language;

deriving a time alignment between analysis frames, for the recorded utterances, and phonemes of the first language utilizing a phoneme-based training algorithm;

deriving associations between the analysis frames and phonemes of a second language utilizing a second language phoneme recognizer on the recorded utterances in the first language;

retrieving the associations between the analysis frames and individual phoneme symbols for both the first language and the second language, wherein such retrieved associations are stored;

statistically analyzing the associations to determine a best phoneme to phoneme (P2P) mapping between the first and second language phonetic inventories;

transcribing all words in the first language phonetic inventory, utilizing the best P2P mapping, utilizing phonemes from the second language phonetic inventory; and

generating a new phonetic inventory utilizing the transcribed words of the first language phonetic inventory.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Acoustic models for speech recognition are automatically generated utilizing trained acoustic models from a native language and a foreign language. A phoneme-to-phoneme mapping is utilized to enable the description of foreign language words with native language phonemes. The phoneme-to-phoneme mapping is used for training foreign language words, described by native language phonemes on foreign language speech material. A new phonetic lexicon is created containing foreign language words and native language words transcribed by native language phonemes. Robust native language acoustic models can be derived utilizing foreign language and native language training material. The mapping may be used for training a grapheme to phoneme transducer (i.e., foreign language to native language) to generate native language pronunciations for new foreign language words.

Citations

20 Claims

1. A method for automatically generating acoustic models for speech recognition, comprising the steps of:
- utilizing stored training material, which includes recorded utterances in a first language, in conjunction with trained acoustic models of the first language;
  
  deriving a time alignment between analysis frames, for the recorded utterances, and phonemes of the first language utilizing a phoneme-based training algorithm;
  
  deriving associations between the analysis frames and phonemes of a second language utilizing a second language phoneme recognizer on the recorded utterances in the first language;
  
  retrieving the associations between the analysis frames and individual phoneme symbols for both the first language and the second language, wherein such retrieved associations are stored;
  
  statistically analyzing the associations to determine a best phoneme to phoneme (P2P) mapping between the first and second language phonetic inventories;
  
  transcribing all words in the first language phonetic inventory, utilizing the best P2P mapping, utilizing phonemes from the second language phonetic inventory; and
  
  generating a new phonetic inventory utilizing the transcribed words of the first language phonetic inventory.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein the first language is a foreign language (FL) and the second language is a native language (NL).
  - 3. The method of claim 2, wherein utterances for the native language is provided utilizing native speakers speaking words in the native language for input into the phonetic reference inventory of the native language.
  - 4. The method of claim 2, wherein the utterances for the foreign language is provided utilizing foreign speakers speaking words in the foreign language for input into the phonetic inventory of the foreign language.
  - 5. The method of claim 1 wherein deriving a time alignment between analysis frames and phonemes of the first language, further comprisesretrieving a word or phrase from the first phonetic inventory;
    - separating each word into phonemes; and
      
      associating each of the phonemes comprising the word or phrase with the analysis frames.
  - 6. The method of claim 1, wherein deriving associations between the analysis frames and phonemes of a second language, further comprisesretrieving a word or phrase from the second phonetic inventory;
    - separating each word into phonemes; and
      
      associating each of the phonemes comprising the word or phrase with the analysis frames.
  - 7. The method of claim 1, further comprisingproviding speech training material comprising foreign speech having sounds, words and phrases wherein the speech is phonetically described.
  - 8. The method of claim 1, further comprising providing a phonetic inventory database wherein the phonetic inventory includes existing word and phrase segmentation in the form of phoneme sequences.
  - 9. The method of claim 8, wherein the existing word and phrase segmentation is utilized to derive a relationship between analysis frames and phonetic symbols.
  - 10. The method of claim 1, wherein the time alignment of analysis frames with phonemes of the first and second language is accomplished during a training iteration.

11. An apparatus for automatically generating acoustic models for speech recognition, comprising:
- a phonetic inventory of a first language;
  
  a phonetic inventory of a second language;
  
  a controller for;
  
  utilizing stored training material, which includes recorded utterances in a first language, in conjunction with trained acoustic models of the first language;
  
  deriving a time alignment between recorded utterance analysis frames and phonemes of the first language utilizing a phoneme-based training algorithm;
  
  deriving associations between the analysis frames and phonemes of a second language utilizing a second language phoneme recognizer on the recorded utterances in the first language;
  
  retrieving the associations between the analysis frames and individual phonemes for both the first language and the second language, wherein such retrieved associations are stored;
  
  statistically analyzing corresponding associations between the analysis frames to determine a best phoneme to phoneme (P2P) mapping between the first and second language phonetic inventories;
  
  transcribing all words in the first language phonetic inventory, utilizing the best P2P mapping, with phonemes from the second language phonetic inventory; and
  
  generating a new phonetic inventory utilizing the transcribed words of the first language phonetic inventory.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The apparatus of claim 11, wherein the first language is a foreign language (FL) and the second language is a native language (NL).
  - 13. The apparatus of claim 12, wherein speech input for the native and foreign language is provided, utilizing native speakers speaking words in the native language and foreign speakers speaking words in the foreign language respectively for input into the phonetic inventory of the native and foreign languages.
  - 14. The apparatus of claim 12, further comprising a grapheme (in a foreign language) to phoneme (in a native language) transducer for generating native language pronunciations for new foreign language words.
  - 15. The apparatus of claim 11 further comprising:
    - a training algorithm for associating phonetic symbols of foreign speech with phonemes retrieving a word or phrase from the first phonetic inventory;
      
      means for separating each word into phonetic symbols; and
      
      means for associating each of the phonetic symbols that comprise the word or phrase with related analysis frames.
  - 16. The apparatus of claim 11, wherein the controller further comprisesmeans for retrieving a word or phrase from the second phonetic inventory;
    - means for separating each word into phonetic symbols; and
      
      means for associating each of the phonetic symbols comprising the word or phrase with analysis frames.
  - 17. The apparatus of claim 11, further comprisinga database for providing training material comprising foreign speech having sounds, words and phrases wherein the material is phonetically described.
  - 18. The apparatus of claim 11, further comprising a phonetic inventory database of the second language wherein the phonetic inventory includes existing word segmentation in the form of phonemes of the second language.
  - 19. The apparatus of claim 18, wherein the existing word segmentation is utilized to derive a relationship between the analysis frames and the phonemes of the second language.
  - 20. The apparatus of claim 11 wherein the time alignment of analysis frames with the phonemes of the first and second languages is accomplished during a training iteration.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Optis Wireless Technology, LLC (Brevet Capital)
Original Assignee
Telefonaktiebolaget LM Ericsson
Inventors
Junkawitsch, Jochen, Reinhard, Klaus, Kieβling, Andreas, Klisch, Rainer
Primary Examiner(s)
ABEBE, DANIEL DEMELASH

Application Number

US10/793,072
Publication Number

US 20050197835A1
Time in Patent Office

1,629 Days
Field of Search

704/257, 704/243, 704/8, 704/10, 704/255, 704/276
US Class Current

704/257
CPC Class Codes

G10L 15/063   Training

G10L 15/187   Phonemic context, e.g. pron...

G10L 2015/025   Phonemes, fenemes or fenone...

Method and apparatus for generating acoustic models for speaker independent speech recognition of foreign words uttered by non-native speakers

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for generating acoustic models for speaker independent speech recognition of foreign words uttered by non-native speakers

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links