×

Self-learning speaker adaptation based on spectral variation source decomposition

  • US 5,664,059 A
  • Filed: 09/16/1996
  • Issued: 09/02/1997
  • Est. Priority Date: 04/29/1993
  • Status: Expired due to Fees
First Claim
Patent Images

1. A self-learning speaker adaptation method for automatic speech recognition comprising:

  • providing training speech from a plurality of training speakers;

    transforming the training speech into a spectral domain such that each training speech utterance is represented by a sequence of training speech spectra;

    building a set of Gaussian density phone models from the spectra of all training speech;

    estimating a spectral bias indicative of speaker acoustic characteristics for each speech utterance using the said set of Gaussian density phone models;

    normalizing the training speech spectra based on speaker acoustic characteristics using the said spectral bias;

    building a plurality of Gaussian mixture density phone models having model parameters including covariance matrices and means vectors and mixture weights using the normalized training speech spectra for use in recognizing speech;

    transforming a first utterance of speech into a spectral domain;

    estimating a spectral bias indicative of speaker acoustic characteristics for the first utterance of speech using the said set of Gaussian density phone models;

    normalizing the first utterance of speech spectra using the said spectral bias;

    recognizing the normalized first utterance of speech spectra to produce a recognized word string;

    segmenting the first utterance of speech spectra using said recognized word string to produce segmented adaptation data;

    modifying the model parameters based on said segmented adaptation data to produce a set of adapted Gaussian mixture density phone models; and

    repeating and transforming, estimating, normalizing, recognizing, segmenting and modifying steps for subsequent utterances, using for each subsequent utterance the adapted Gaussian mixture density phone models produced from the previous utterance, whereby the Gaussian mixture density phone models are automatically adapted to that speaker in self-learning fashion.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×