Automatic language independent triphone training using a phonetic table

US 20050075887A1
Filed: 10/07/2003
Published: 04/07/2005
Est. Priority Date: 10/07/2003
Status: Active Grant

First Claim

Patent Images

1. A method for training acoustic models for a target language comprising the steps of:

a) providing a phonetic table of a reference, which characterizes the phones used in one or more reference languages with respect to their articulatory properties;

b) providing a phonetic table of a target language, which characterizes the phones used in the target language with respect to their articulatory properties;

c) providing a set of trained monophones for each reference language;

d) providing a database of utterances in the target language and phonetic transcription of the utterances in the database; and

e) processing using table correspondence processing methods and phonetic model seeding.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for training acoustic models for a new target language is provided using a phonetic table, which characterizes the phones, used in one or more reference language(s) with respect to their articulatory properties; a phonetic table, which characterizes the phones used in the target language with respect to their articulatory properties; a set of trained monophones for the reference language(s); and a database of sentences in the target language and its phonetic transcription. With these inputs, the new method completely and automatically takes care of the steps of monophone seeding and triphone clustering and machine intensive training steps involved in triphone acoustic training.

Citations

10 Claims

1. A method for training acoustic models for a target language comprising the steps of:
- a) providing a phonetic table of a reference, which characterizes the phones used in one or more reference languages with respect to their articulatory properties;
  
  b) providing a phonetic table of a target language, which characterizes the phones used in the target language with respect to their articulatory properties;
  
  c) providing a set of trained monophones for each reference language;
  
  d) providing a database of utterances in the target language and phonetic transcription of the utterances in the database; and
  
  e) processing using table correspondence processing methods and phonetic model seeding.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1 wherein said processing step includes creating seed monophone models for the target language using the phonetic table of one or more reference languages, the phonetic models of the reference languages, and the phonetic table of the target language;
    - and automatically performing triphone clustering based on the phonetic table of the reference and target language.
  - 3. The method of claim 2 wherein said creating seed monophone models includes the steps of selecting for each phone in the target language, the phone in the reference language(s) that is the most similar in terms of articulatory characteristics defined in the phonetic table and based on the topology (number of states) of the target language phone and its best match in the reference language(s), the topology of the selected monophone model in the reference language(s) is modified to produce the seed monophone model of the target language phone.
  - 4. The method of claim 1 wherein triphone clustering includes the steps of generating a list of articulatory based questions, transforming each question of a universal question list into a language specific question which specifies exhaustively which phones conform to the articulatory characteristics asked by the question based on the phonetic table of the target language, and constructing, for all A−
    - B+C triphones sharing the same center phone B using the target language specific question list, an acoustic decision tree that maximizes the likelihood of observing the data by selecting at each node in the tree the question that most increases likelihood.
  - 5. The method of claim 4 wherein different triphone clustering operations can be carried out using different clustering parameters (minimum population for each cluster and likelihood increase threshold) in order to further optimize the performance (in terms of recognition performance or model size) of the trained acoustic models.

6. A method of creating seed context-dependent phonetic models comprising the steps of:
- a) obtaining a first set of phonetic models representing a first set of phones;
  
  b) obtaining a transcription of a database in terms of the first set of phones;
  
  c) performing forced alignment of each utterance in a database using the first set of phonetic models and the transcription, resulting in the location of information in the database corresponding to each of the first phones in the database;
  
  d) generating the locations of each context-dependent phone in the database based on the locations of the first phones in the database and the context of the first phones specified by the transcription; and
  
  e) using the information in the database at all locations corresponding to a context-dependent phone constructing a context-dependent phonetic model for that context-dependent phone.
- View Dependent Claims (7, 8)
- - 7. The method of claim 6 where the phonetic models are monophones.
  - 8. The method of claim 6 where the context-dependent phonetic models are triphones.

9. A method for training acoustic models for a target language comprising the steps of:
- deriving and encoding providing phonetic tables of one or more reference languages and a phonetic table for a new target language, providing a speech database collected in the new language and a phonetic transcription of the database, processing using table correspondence to generate seed monophone phonetic models specific to the new target language, training the monophone phonetic models automatically using existing known training techniques, automatically generating accurate seed triphone models specific to the language subsequent to monophone model training, determining optimal clustering of the triphone phonetic model parameters utilizing the phonetic table information, and automatically training the triphone phonetic models.
- View Dependent Claims (10)
- - 10. The method of claim 9 including steps to optimize the resulting trained phonetic models to improve speech recognition performance within the training of the triphone acoustic models.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Texas Instruments, Inc.
Original Assignee
Texas Instruments, Inc.
Inventors
Netsch, Lorin P., Bernard, Alexis P.

Granted Patent

US 7,289,958 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/277
CPC Class Codes

G10L 15/063 Training

G10L 15/187 Phonemic context, e.g. pron...

Automatic language independent triphone training using a phonetic table

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

Automatic language independent triphone training using a phonetic table

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links