Self-learning speaker adaptation based on spectral bias source decomposition, using very short calibration speech
First Claim
Patent Images
1. A speech recognition method comprising the steps of:
- a. providing training speech that includes a passage of calibration speech for each training speaker;
b. representing the training speech in a spectral domain such that each training speech utterance is represented by a sequence of training speech spectra;
c. building a first set of Gaussian density phone models from the spectra of all calibration speech;
d. estimating a spectral bias indicative of speaker acoustic characteristics for each calibration speech using said first set of Gaussian density phone models;
e. normalizing the training speech spectra based on speaker acoustic characteristics using said spectral bias;
f. building a second set of Gaussian mixture density phone models having parameters of mean vectors, covariance matrices and mixture weights from said normalized training speech spectra;
g. taking a passage of calibration speech from each speaker;
h. representing the calibration speech in a spectral domain such that each calibration speech utterance is represented by a sequence of speech spectra;
i. estimating a spectral bias indicative of speaker acoustic characteristics for each calibration speech using said second set of Gaussian mixture density phone models built in step f;
j. normalizing the calibration speech spectra based on speaker acoustic characteristics using said spectral bias;
k. adapting the phone model parameters based on speaker phonologic characteristics using the normalized calibration speech, where context modulation vectors are estimated between Gaussian densities in each mixture, and the context modulation vectors are used to shift the spectra of the calibration speech;
l. providing test speech for speech recognition;
m. representing the test speech in a spectral domain such that the test speech is represented by a sequence of test speech spectra;
n. normalizing the test speech spectra based on speaker acoustic characteristics using said spectral bias;
o. using the normalized test speech spectra in conjunction with the adapted Gaussian mixture density phone models to recognize the test speech.
1 Assignment
0 Petitions
Accused Products
Abstract
A speaker adaptation technique based on the separation of speech spectra variation sources is developed for improving speaker-independent continuous speech recognition. The variation sources include speaker acoustic characteristics, and contextual dependency of allophones. Statistical methods are formulated to normalize speech spectra based on speaker acoustic characteristics and then adapt mixture Gaussian density phone models based on speaker phonologic characteristics. Adaptation experiments using short calibration speech (5 sec./speaker) have shown substantial performance improvement over the baseline recognition system.
38 Citations
16 Claims
-
1. A speech recognition method comprising the steps of:
-
a. providing training speech that includes a passage of calibration speech for each training speaker; b. representing the training speech in a spectral domain such that each training speech utterance is represented by a sequence of training speech spectra; c. building a first set of Gaussian density phone models from the spectra of all calibration speech; d. estimating a spectral bias indicative of speaker acoustic characteristics for each calibration speech using said first set of Gaussian density phone models; e. normalizing the training speech spectra based on speaker acoustic characteristics using said spectral bias; f. building a second set of Gaussian mixture density phone models having parameters of mean vectors, covariance matrices and mixture weights from said normalized training speech spectra; g. taking a passage of calibration speech from each speaker; h. representing the calibration speech in a spectral domain such that each calibration speech utterance is represented by a sequence of speech spectra; i. estimating a spectral bias indicative of speaker acoustic characteristics for each calibration speech using said second set of Gaussian mixture density phone models built in step f; j. normalizing the calibration speech spectra based on speaker acoustic characteristics using said spectral bias; k. adapting the phone model parameters based on speaker phonologic characteristics using the normalized calibration speech, where context modulation vectors are estimated between Gaussian densities in each mixture, and the context modulation vectors are used to shift the spectra of the calibration speech; l. providing test speech for speech recognition; m. representing the test speech in a spectral domain such that the test speech is represented by a sequence of test speech spectra; n. normalizing the test speech spectra based on speaker acoustic characteristics using said spectral bias; o. using the normalized test speech spectra in conjunction with the adapted Gaussian mixture density phone models to recognize the test speech. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
Specification