System and method for training an acoustic model with reduced feature space variation
First Claim
1. A method of training an acoustic model, the method comprising:
- generating a specific text element set as a subset of a general text element set;
generating a combined phoneme set, the combined phoneme set including ordinate phonemes occurring in the general text element set and renamed specific phonemes corresponding to at least some of the ordinate phonemes occurring in the specific text element set;
generating a combined dictionary, the combined dictionary including text elements from the general text element set outside the specific text element set with phonetic spellings including the ordinate phonemes and renamed specific text elements from the specific text element set with phonetic spellings including the renamed specific phonemes;
generating a combined transcription set, the combined transcription set including transcriptions with the text elements from the general text element set outside the specific text element set and the renamed specific text elements; and
training the acoustic model using the combined phoneme set, the combined dictionary, the combined transcription set and an audio file set.
2 Assignments
0 Petitions
Accused Products
Abstract
Feature space variation associated with specific text elements is reduced by training an acoustic model with a phoneme set, dictionary and transcription set configured to better distinguish the specific text elements and at least some specific phonemes associated therewith. The specific text elements can include the most frequently occurring text elements from a text data set, which can include text data beyond the transcriptions of a training data set. The specific text elements can be identified using a text element distribution table sorted by occurrence within the text data set. Specific phonemes can be limited to consonant phonemes to improve speed and accuracy.
-
Citations
26 Claims
-
1. A method of training an acoustic model, the method comprising:
-
generating a specific text element set as a subset of a general text element set; generating a combined phoneme set, the combined phoneme set including ordinate phonemes occurring in the general text element set and renamed specific phonemes corresponding to at least some of the ordinate phonemes occurring in the specific text element set; generating a combined dictionary, the combined dictionary including text elements from the general text element set outside the specific text element set with phonetic spellings including the ordinate phonemes and renamed specific text elements from the specific text element set with phonetic spellings including the renamed specific phonemes; generating a combined transcription set, the combined transcription set including transcriptions with the text elements from the general text element set outside the specific text element set and the renamed specific text elements; and training the acoustic model using the combined phoneme set, the combined dictionary, the combined transcription set and an audio file set. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. An acoustic model training system comprising at least one processor and machine readable memory configured to execute:
-
a combined phoneme set including ordinate phonemes and renamed specific phonemes, the renamed specific phonemes corresponding to at least some of the ordinate phonemes; a combined dictionary including renamed specific text elements with corresponding phonetic spellings using the renamed specific phonemes and unrenamed text elements with corresponding spellings using the ordinate phonemes; an audio file set; a combined transcription set corresponding to the audio file set and including transcriptions with the renamed specific text elements; and a training module configured to train the acoustic model based on the audio file set, the combined transcription set, the combined phoneme set and the combined dictionary. - View Dependent Claims (22)
-
-
23. A method of training an acoustic model, the method comprising:
-
generating a specific word set based on frequency of occurrence within a text data set; generating a phoneme set including renamed specific phonemes used in phonetic spellings of the specific word set; generating a dictionary including renamed specific words of the specific word set with phonetic spellings including the renamed specific phonemes; generating a transcription set including transcriptions having the renamed specific words therein; and training the acoustic model based on the phoneme set, the dictionary, the transcription set and an audio file set; wherein generating the dictionary and the transcription set include selectively including unrenamed specific words with phonetic spellings without renamed specific phonemes. - View Dependent Claims (24, 25, 26)
-
Specification