Language model adaptation for automatic speech recognition
First Claim
1. A method of adapting a language model with language model values for automatic speech recognition, in which test values are derived from a speech signal and compared with reference values determining a given vocabulary, there being derived scores which are linked to language model values at word boundaries, the language model values being dependent on the probability of occurrence of a given word of the vocabulary in dependence on at least one predecessor word, said method including the following steps:
- determination of a basic language model with basic language model values on the basis of training speech signals,determination, utilizing statistic calculation methods, of confidence intervals, having an upper and a lower boundary for language model values, on the basis of a different speech signal which deviates from the training speech signals,determination of a scaling factor in such a manner that the basic language model values scaled thereby satisfy an optimization criterion which the position of the scaled language model values are within the confidence intervals,use of scaled language model values which are situated in the confidence intervals and, in the case of scaled language model values situated beyond the boundaries of the confidence intervals, the nearest boundary as adapted language model values and, for confidence intervals not determined from the different speech signal, the basic language model values for the further recognition of the different speech signal.
1 Assignment
0 Petitions
Accused Products
Abstract
For speech recognition, notably the recognition of curtly spoken speech with a large vocabulary, language models which take into account the probabilities of occurrence of word sequences are used so as to enhance the recognition reliability. These language models are determined from rather large quantities of text and hence represent an average value formed over several texts. However, the language model hence is not adapted very well to particularities of a special text. In order to achieve such adaptation of an existing language model to a special text while using only a small quantity of text, the invention proposes to determine confidence intervals from the counts of the word sequences occurring in the short text; this determination is possible by using calculation methods which are known from statistics. Subsequently, for each predecessor word sequence there is determined a scaling factor which adapts the language model values for all words in such a manner that as many adapted language model values as possible are situated within the confidence intervals. If scaled language model values are situated outside associated confidence intervals after the adaptation, the nearest boundaries of the confidence intervals are used as adapted language model values.
-
Citations
2 Claims
-
1. A method of adapting a language model with language model values for automatic speech recognition, in which test values are derived from a speech signal and compared with reference values determining a given vocabulary, there being derived scores which are linked to language model values at word boundaries, the language model values being dependent on the probability of occurrence of a given word of the vocabulary in dependence on at least one predecessor word, said method including the following steps:
-
determination of a basic language model with basic language model values on the basis of training speech signals, determination, utilizing statistic calculation methods, of confidence intervals, having an upper and a lower boundary for language model values, on the basis of a different speech signal which deviates from the training speech signals, determination of a scaling factor in such a manner that the basic language model values scaled thereby satisfy an optimization criterion which the position of the scaled language model values are within the confidence intervals, use of scaled language model values which are situated in the confidence intervals and, in the case of scaled language model values situated beyond the boundaries of the confidence intervals, the nearest boundary as adapted language model values and, for confidence intervals not determined from the different speech signal, the basic language model values for the further recognition of the different speech signal. - View Dependent Claims (2)
-
Specification