METHOD AND SYSTEM FOR ACOUSTIC DATA SELECTION FOR TRAINING THE PARAMETERS OF AN ACOUSTIC MODEL
First Claim
1. A method for training models in speech recognition systems through the selection of acoustic data comprising the steps of:
- a. training an acoustic model;
b. performing a forced Viterbi alignment;
c. calculating a total likelihood score;
d. performing a phoneme recognition;
e. retaining selected audio files; and
f. training a new acoustic model.
6 Assignments
0 Petitions
Accused Products
Abstract
A system and method are presented for acoustic data selection of a particular quality for training the parameters of an acoustic model, such as a Hidden Markov Model and Gaussian Mixture Model, for example, in automatic speech recognition systems in the speech analytics field. A raw acoustic model may be trained using a given speech corpus and maximum likelihood criteria. A series of operations are performed, such as a forced Viterbi-alignment, calculations of likelihood scores, and phoneme recognition, for example, to form a subset corpus of training data. During the process, audio files of a quality that does not meet a criterion, such as poor quality audio files, may be automatically rejected from the corpus. The subset may then be used to train a new acoustic model.
-
Citations
32 Claims
-
1. A method for training models in speech recognition systems through the selection of acoustic data comprising the steps of:
-
a. training an acoustic model; b. performing a forced Viterbi alignment; c. calculating a total likelihood score; d. performing a phoneme recognition; e. retaining selected audio files; and f. training a new acoustic model. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A method for training an acoustic model in an automatic speech recognition system comprising the steps of:
-
a. training a set of raw data using a given speech corpus and the maximum likelihood criteria; b. performing a forced Viterbi-alignment; c. calculating a total likelihood score; d. performing phoneme recognition on audio files in said corpus; e. retaining selected audio files; f. forming a subset corpus of training data; and g. training a new acoustic model with said subset. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
-
-
32. A system for training models in speech recognition systems through the selection of acoustic data comprising:
-
a. means for training an acoustic model; b. means for performing a forced Viterbi alignment; c. means for calculating a total likelihood score; d. means for performing a phoneme recognition; e. means for retaining selected audio files; and f. means for training a new acoustic model.
-
Specification