State-dependent speaker clustering for speaker adaptation
First Claim
1. A method for adapting the parameters of a speech recognition system during a training process, to better recognize speech of a particular test speaker comprising the steps of:
- calculating the acoustic characterization of a plurality of training speakers for all acoustic subspaces of an acoustic space, the acoustic characterizations being individually identifiable for each training speaker for each acoustic subspace;
calculating the acoustic characterization of a test speaker from adaptation data provided by said test speaker for acoustic subspaces of the acoustic space;
computing a match score between the test speaker'"'"'s characterization for each acoustic subspace, and each training speaker'"'"'s characterization for the same acoustic subspace;
ranking each of the training speakers in the acoustic subspace based upon the score; and
for each acoustic space, generating a re-estimated acoustic model for the particular acoustic subspace using individually identifiable data respectively derived from the one or more training speakers closest to the test speaker for that acoustic subspace, the re-estimated acoustic model for each acoustic subspace being used during a decoding process.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method for adaptation of a speaker independent speech recognition system for use by a particular user. The system and method gather acoustic characterization data from a test speaker and compare the data with acoustic characterization data generated for a plurality of training speakers. A match score is computed between the test speaker'"'"'s acoustic characterization for a particular acoustic subspace and each training speaker'"'"'s acoustic characterization for the same acoustic subspace. The training speakers are ranked for the subspace according to their scores and a new acoustic model is generated for the test speaker based upon the test speaker'"'"'s acoustic characterization data and the acoustic characterization data of the closest matching training speakers. The process is repeated for each acoustic subspace.
-
Citations
23 Claims
-
1. A method for adapting the parameters of a speech recognition system during a training process, to better recognize speech of a particular test speaker comprising the steps of:
-
calculating the acoustic characterization of a plurality of training speakers for all acoustic subspaces of an acoustic space, the acoustic characterizations being individually identifiable for each training speaker for each acoustic subspace; calculating the acoustic characterization of a test speaker from adaptation data provided by said test speaker for acoustic subspaces of the acoustic space; computing a match score between the test speaker'"'"'s characterization for each acoustic subspace, and each training speaker'"'"'s characterization for the same acoustic subspace; ranking each of the training speakers in the acoustic subspace based upon the score; and for each acoustic space, generating a re-estimated acoustic model for the particular acoustic subspace using individually identifiable data respectively derived from the one or more training speakers closest to the test speaker for that acoustic subspace, the re-estimated acoustic model for each acoustic subspace being used during a decoding process. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system comprising:
-
means for adapting the parameters of a speech recognition system during a training process to better recognize speech of a particular test speaker; means for calculating the acoustic characterization of a plurality of training speakers for all acoustic subspaces of an acoustic space, the acoustic characterizations being individually identifiable for each training speaker for each acoustic subspace; means for calculating the acoustic characterization of a test speaker from adaptation data provided by said test speaker for acoustic subspaces of the acoustic space; means for computing a match score between the test speaker'"'"'s characterization for each acoustic subspace, and each training speaker'"'"'s characterization for the same acoustic subspace; means for ranking each of the training speakers in the acoustic subspace based upon the score; and means for each acoustic space, for generating a re-estimated acoustic model for the particular acoustic subspace using individually identifiable data respectively derived from the one or more training speakers closest to the test speaker for that acoustic subspace, the re-estimated acoustic model for each acoustic subspace being used during a decoding process. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A training system, comprising:
-
a training speaker feature vector and training script store for storing a script and a plurality of test speakers utterances of the training script; a viterbi alignment processor for aligning each of the plurality of training speakers'"'"' feature vectors with the training script; a training speaker data model for modeling each of the training speakers'"'"' feature vectors in acoustic space, given the alignment with the training script; a test speaker feature vector and test script store for storing feature vectors representing a test speaker'"'"'s utterances of the test script; a test speaker viterbi alignment processor for aligning the test speaker'"'"'s feature vectors with the test script; a training speaker select processor for selecting for each of a plurality of acoustic subspaces in the acoustic space, one or more individually identifiable training speakers whose feature vectors most closely match the test speaker'"'"'s feature vectors in that acoustic subspace; a training data transformer for mapping the selected training speaker'"'"'s feature vectors to the test speakers'"'"' feature vectors; a Gaussian parameter re-estimation processor for estimating the first and second moment and posterior probability of the mapped feature vectors; and a decoder for decoding the estimated moments and probability.
-
Specification