System and method for training an acoustic model with reduced feature space variation

US 8,301,446 B2
Filed: 03/30/2009
Issued: 10/30/2012
Est. Priority Date: 03/30/2009
Status: Active Grant

First Claim

Patent Images

1. A method of training an acoustic model, the method comprising:

generating a specific text element set as a subset of a general text element set;

generating a combined phoneme set, the combined phoneme set including ordinate phonemes occurring in the general text element set and renamed specific phonemes corresponding to at least some of the ordinate phonemes occurring in the specific text element set;

generating a combined dictionary, the combined dictionary including text elements from the general text element set outside the specific text element set with phonetic spellings including the ordinate phonemes and renamed specific text elements from the specific text element set with phonetic spellings including the renamed specific phonemes;

generating a combined transcription set, the combined transcription set including transcriptions with the text elements from the general text element set outside the specific text element set and the renamed specific text elements; and

training the acoustic model using the combined phoneme set, the combined dictionary, the combined transcription set and an audio file set.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Feature space variation associated with specific text elements is reduced by training an acoustic model with a phoneme set, dictionary and transcription set configured to better distinguish the specific text elements and at least some specific phonemes associated therewith. The specific text elements can include the most frequently occurring text elements from a text data set, which can include text data beyond the transcriptions of a training data set. The specific text elements can be identified using a text element distribution table sorted by occurrence within the text data set. Specific phonemes can be limited to consonant phonemes to improve speed and accuracy.

Citations

26 Claims

1. A method of training an acoustic model, the method comprising:
- generating a specific text element set as a subset of a general text element set;
  
  generating a combined phoneme set, the combined phoneme set including ordinate phonemes occurring in the general text element set and renamed specific phonemes corresponding to at least some of the ordinate phonemes occurring in the specific text element set;
  
  generating a combined dictionary, the combined dictionary including text elements from the general text element set outside the specific text element set with phonetic spellings including the ordinate phonemes and renamed specific text elements from the specific text element set with phonetic spellings including the renamed specific phonemes;
  
  generating a combined transcription set, the combined transcription set including transcriptions with the text elements from the general text element set outside the specific text element set and the renamed specific text elements; and
  
  training the acoustic model using the combined phoneme set, the combined dictionary, the combined transcription set and an audio file set.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 2. The method of claim 1, wherein the specific text element set is at least one of a specific word set or a specific phrase set.
  - 3. The method of claim 1, wherein the renamed specific phonemes include only phonemes corresponding to consonants.
  - 4. The method of claim 1, further comprising generating a text element distribution table, the distribution table including the general text element set sorted by occurrence within a text data set.
  - 5. The method of claim 4, wherein generating the specific text element set includes selecting specific text elements from the distribution table based on frequency of occurrence within the text data set.
  - 6. The method of claim 4, wherein selecting specific text elements from the distribution table based on frequency of occurrence includes selecting a minimum number of specific text elements sufficient to contribute a predetermined cumulative occurrence within the text data set.
  - 7. The method of claim 4, wherein the text data set includes text data from texts different than the transcriptions.
  - 8. The method of claim 1, wherein generating the combined phoneme set includes generating an intermediate specific phoneme set, the intermediate specific phoneme set including specific phonemes, the specific phonemes including only the ordinate phonemes needed for phonetic spellings of the specific text element set.
  - 9. The method of claim 8, wherein only the specific phonemes corresponding to consonants are retained in the intermediate specific phoneme set.
  - 10. The method of claim 8, wherein generating the combined phoneme set further includes generating a renamed specific phoneme set by renaming the specific phonemes with unique, new names.
  - 11. The method of claim 10, wherein generating the combined phoneme set further includes combining the renamed specific phoneme set with the ordinate phonemes.
  - 12. The method of claim 1, wherein generating the combined dictionary includes generating an intermediate specific dictionary by selecting, from a general dictionary, only specific text elements from the specific text element set and corresponding phonetic spellings.
  - 13. The method of claim 12, wherein generating the combined dictionary further includes generating a renamed specific dictionary by renaming the specific text elements with unique new, names and respelling the renamed specific text elements with the renamed specific phonemes.
  - 14. The method of claim 12, wherein generating the combined dictionary further includes generating a reduced dictionary by removing the specific text elements and the corresponding phonetic spellings from the general dictionary, and combining the reduced dictionary and the renamed specific dictionary.
  - 15. The method of claim 14, wherein generating the combined dictionary further includes selectively combining the reduced dictionary and the renamed specific dictionary with the intermediate specific dictionary.
  - 16. The method of claim 1, wherein generating the combined transcription set includes generating an intermediate specific transcription set including only those transcriptions from a general transcription set that include at least one specific text element from the specific text element set.
  - 17. The method of claim 16, wherein generating the combined transcription set further includes generating a renamed specific transcription set by replacing the specific text elements with the renamed specific text elements.
  - 18. The method of claim 17, wherein generating the combined transcription set further includes generating a reduced transcription set by removing the transcriptions of the intermediate specific transcription set from the general transcription set, and combining at least the reduced transcription set and the renamed specific transcription set.
  - 19. The method of claim 18, wherein generating the combined transcription set further includes selectively combining the reduced transcription set and the renamed specific transcription set with the intermediate specific transcription set.
  - 20. The method of claim 19, wherein selectively combining the reduced transcription set and the renamed specific transcription set with the intermediate specific transcription set further includes applying weighting coefficients to each transcription set.

21. An acoustic model training system comprising at least one processor and machine readable memory configured to execute:
- a combined phoneme set including ordinate phonemes and renamed specific phonemes, the renamed specific phonemes corresponding to at least some of the ordinate phonemes;
  
  a combined dictionary including renamed specific text elements with corresponding phonetic spellings using the renamed specific phonemes and unrenamed text elements with corresponding spellings using the ordinate phonemes;
  
  an audio file set;
  
  a combined transcription set corresponding to the audio file set and including transcriptions with the renamed specific text elements; and
  
  a training module configured to train the acoustic model based on the audio file set, the combined transcription set, the combined phoneme set and the combined dictionary.
- View Dependent Claims (22)
- - 22. The system of claim 21, wherein the combined transcription set further includes transcriptions without any renamed specific text elements.

23. A method of training an acoustic model, the method comprising:
- generating a specific word set based on frequency of occurrence within a text data set;
  
  generating a phoneme set including renamed specific phonemes used in phonetic spellings of the specific word set;
  
  generating a dictionary including renamed specific words of the specific word set with phonetic spellings including the renamed specific phonemes;
  
  generating a transcription set including transcriptions having the renamed specific words therein; and
  
  training the acoustic model based on the phoneme set, the dictionary, the transcription set and an audio file set;
  
  wherein generating the dictionary and the transcription set include selectively including unrenamed specific words with phonetic spellings without renamed specific phonemes.
- View Dependent Claims (24, 25, 26)
- - 24. The method of claim 23, wherein the text data set differs from a training data set for the acoustic model.
  - 25. The method of claim 23, wherein the renamed specific phonemes include only phonemes corresponding to consonants.
  - 26. The method of claim 23, wherein selectively including unrenamed specific words with phonetic spellings without renamed specific phonemes includes setting weighting coefficients to determine a relative contribution of the unrenamed specific words.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Adacel Systems, Inc. (Adacel Technologies Ltd.)
Original Assignee
Adacel Systems, Inc. (Adacel Technologies Ltd.)
Inventors
Shu, Chang-Qing
Primary Examiner(s)
He, Jialong

Application Number

US12/413,896
Publication Number

US 20100250240A1
Time in Patent Office

1,310 Days
Field of Search

704/251, 704/270, 704/275
US Class Current

704/251
CPC Class Codes

G10L 15/063   Training

G10L 15/187   Phonemic context, e.g. pron...

G10L 2015/025   Phonemes, fenemes or fenone...

System and method for training an acoustic model with reduced feature space variation

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for training an acoustic model with reduced feature space variation

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links