System and method of recognizing an acoustic environment to adapt a set of based recognition models to the current acoustic environment for subsequent speech recognition
First Claim
Patent Images
1. A signal processing method for recognizing unknown speech signals, comprising the following steps:
- receiving an unknown speech signal representing unknown speech;
generating a sequence of feature vectors characterizing the unknown speech signal;
identifying an acoustic environment of the unknown speech based on the sequence of feature vectors and a set of classification models;
adjusting a base set of recognition models to reflect the identified acoustic environment byproviding a model transformation projector corresponding to the identified acoustic environment, andapplying a transformation based on the model transformation projector to the base set of recognition models;
recognizing the unknown speech signal based on the sequence of feature vectors and the set of adjusted recognition models; and
adapting the model transformation projector based on an adjustment made to the base set of recognition models.
4 Assignments
0 Petitions
Accused Products
Abstract
A speech recognition system which effectively recognizes unknown speech from multiple acoustic environments includes a set of secondary models, each associated with one or more particular acoustic environments, integrated with a base set of recognition models. The speech recognition system is trained by making a set of secondary models in a first stage of training, and integrating the set of secondary models with a base set of recognition models in a second stage of training.
-
Citations
3 Claims
-
1. A signal processing method for recognizing unknown speech signals, comprising the following steps:
-
receiving an unknown speech signal representing unknown speech; generating a sequence of feature vectors characterizing the unknown speech signal; identifying an acoustic environment of the unknown speech based on the sequence of feature vectors and a set of classification models; adjusting a base set of recognition models to reflect the identified acoustic environment by providing a model transformation projector corresponding to the identified acoustic environment, and applying a transformation based on the model transformation projector to the base set of recognition models; recognizing the unknown speech signal based on the sequence of feature vectors and the set of adjusted recognition models; and adapting the model transformation projector based on an adjustment made to the base set of recognition models.
-
-
2. A method of training a speech recognition system, comprising the following steps:
-
(A) providing a base set of recognition models and model parameters associated therewith which are stored in a recognition database; (B) splitting the base set of recognition models into N sets of current models, thereby defining N particular acoustic environments corresponding to the N sets of current models; (C) storing the N sets of current models in a classification database; (D) scoring one or more known training utterances against each of the N sets of current models; (E) assigning each of the known training utterances to one of the N particular acoustic environments based on the highest score of the known training utterance for the N sets of current models; (F) training each of the N sets of current models associated with the N particular acoustic environments using the known training utterances assigned to that particular acoustic environment, thereby making N sets of new models; (G) storing the N sets of new models in the classification database in place of the N sets of current models; and (H) for each particular acoustic environment, (i) discriminatively training the base set of recognition models using the known training utterances assigned to that particular acoustic environment to project the base set of recognition models to reflect that particular acoustic environment, (ii) saving a set of the differences between the state of the model parameters of the base set of recognition models before discriminative training and after discriminative training which corresponds to the distortion caused by the particular acoustic environment, (iii) clustering the differences arrived at by discriminative training, and (iv) saving the clustered set of differences as a model transformation projector which can be used for adjusting the base set of recognition models to reflect that particular acoustic environment. - View Dependent Claims (3)
-
Specification