Method and system for acoustic data selection for training the parameters of an acoustic model
First Claim
1. A computer-implemented method for training acoustic models in an automatic speech recognition system through a selection of acoustic data comprising:
- training a first acoustic model in the automatic speech recognition system using a training-data corpus comprising a plurality of speech audio files and a respective plurality of transcriptions for the plurality of speech audio files;
performing a forced Viterbi alignment of the plurality of speech audio files using the trained first acoustic model in the automatic speech recognition system;
calculating a global frame likelihood score δ
for the plurality of speech audio files, wherein the global frame likelihood score δ
comprises an average of frame likelihoods over the training-data corpus;
creating a first subset of the training-data corpus comprising one or more audio files by selecting the one or more audio files from the plurality of speech audio files based on the global frame likelihood score δ
;
performing a phoneme recognition of the plurality of speech audio files using the trained first acoustic model and the respective plurality of transcriptions in the automatic speech recognition system;
calculating a global phoneme recognition accuracy ν
for the plurality of speech audio files;
creating a second subset of the training-data corpus comprising audio files retained from the one or more audio files of the first subset of the training-data corpus which meet at least one predetermined criterion indicating that an audio file has good audio quality; and
training a second acoustic model in the automatic speech recognition system using the second subset of the training-data corpus.
4 Assignments
0 Petitions
Accused Products
Abstract
A system and method are presented for acoustic data selection of a particular quality for training the parameters of an acoustic model, such as a Hidden Markov Model and Gaussian Mixture Model, for example, in automatic speech recognition systems in the speech analytics field. A raw acoustic model may be trained using a given speech corpus and maximum likelihood criteria. A series of operations are performed, such as a forced Viterbi-alignment, calculations of likelihood scores, and phoneme recognition, for example, to form a subset corpus of training data. During the process, audio files of a quality that does not meet a criterion, such as poor quality audio files, may be automatically rejected from the corpus. The subset may then be used to train a new acoustic model.
10 Citations
26 Claims
-
1. A computer-implemented method for training acoustic models in an automatic speech recognition system through a selection of acoustic data comprising:
-
training a first acoustic model in the automatic speech recognition system using a training-data corpus comprising a plurality of speech audio files and a respective plurality of transcriptions for the plurality of speech audio files; performing a forced Viterbi alignment of the plurality of speech audio files using the trained first acoustic model in the automatic speech recognition system; calculating a global frame likelihood score δ
for the plurality of speech audio files, wherein the global frame likelihood score δ
comprises an average of frame likelihoods over the training-data corpus;creating a first subset of the training-data corpus comprising one or more audio files by selecting the one or more audio files from the plurality of speech audio files based on the global frame likelihood score δ
;performing a phoneme recognition of the plurality of speech audio files using the trained first acoustic model and the respective plurality of transcriptions in the automatic speech recognition system; calculating a global phoneme recognition accuracy ν
for the plurality of speech audio files;creating a second subset of the training-data corpus comprising audio files retained from the one or more audio files of the first subset of the training-data corpus which meet at least one predetermined criterion indicating that an audio file has good audio quality; and training a second acoustic model in the automatic speech recognition system using the second subset of the training-data corpus. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A computer-implemented method for training acoustic models in an automatic speech recognition system comprising:
-
training a first acoustic model in the automatic speech recognition system using a speech corpus comprising a plurality of speech audio files and a respective plurality of transcriptions for the plurality of speech audio files by calculating a maximum likelihood criterion of the speech corpus and estimating parameters of a probability distribution of the first acoustic model that maximize the maximum likelihood criterion; performing a forced Viterbi alignment of the plurality of speech audio files using the trained first acoustic model in the automatic speech recognition system; calculating a global frame likelihood score δ
for the plurality of speech audio files, wherein the global frame likelihood score δ
comprises an average of frame likelihoods over the speech corpus;creating a first subset of the speech corpus comprising one or more audio files by selecting the one or more audio files from the plurality of speech audio files based on the global frame likelihood score δ
;performing a phoneme recognition of the plurality of speech audio files using the trained first acoustic model and the respective plurality of transcriptions in the automatic speech recognition system; calculating a global phoneme recognition accuracy ν
for the plurality of speech audio files;creating a second subset of the speech corpus comprising audio files retained from the one or more audio files of the first subset of the speech corpus which meet at least one predetermined criterion indicating that an audio file has good audio quality; and training a second acoustic model in the automatic speech recognition system with said second subset of the speech corpus. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
Specification