×

Self-learning speaker adaptation based on spectral bias source decomposition, using very short calibration speech

  • US 5,794,192 A
  • Filed: 09/12/1996
  • Issued: 08/11/1998
  • Est. Priority Date: 04/29/1993
  • Status: Expired due to Fees
First Claim
Patent Images

1. A speech recognition method comprising the steps of:

  • a. providing training speech that includes a passage of calibration speech for each training speaker;

    b. representing the training speech in a spectral domain such that each training speech utterance is represented by a sequence of training speech spectra;

    c. building a first set of Gaussian density phone models from the spectra of all calibration speech;

    d. estimating a spectral bias indicative of speaker acoustic characteristics for each calibration speech using said first set of Gaussian density phone models;

    e. normalizing the training speech spectra based on speaker acoustic characteristics using said spectral bias;

    f. building a second set of Gaussian mixture density phone models having parameters of mean vectors, covariance matrices and mixture weights from said normalized training speech spectra;

    g. taking a passage of calibration speech from each speaker;

    h. representing the calibration speech in a spectral domain such that each calibration speech utterance is represented by a sequence of speech spectra;

    i. estimating a spectral bias indicative of speaker acoustic characteristics for each calibration speech using said second set of Gaussian mixture density phone models built in step f;

    j. normalizing the calibration speech spectra based on speaker acoustic characteristics using said spectral bias;

    k. adapting the phone model parameters based on speaker phonologic characteristics using the normalized calibration speech, where context modulation vectors are estimated between Gaussian densities in each mixture, and the context modulation vectors are used to shift the spectra of the calibration speech;

    l. providing test speech for speech recognition;

    m. representing the test speech in a spectral domain such that the test speech is represented by a sequence of test speech spectra;

    n. normalizing the test speech spectra based on speaker acoustic characteristics using said spectral bias;

    o. using the normalized test speech spectra in conjunction with the adapted Gaussian mixture density phone models to recognize the test speech.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×