Model Adaptation System and Method for Speaker Recognition

US 20080208581A1
Filed: 12/03/2004
Published: 08/28/2008
Est. Priority Date: 12/05/2003
Status: Abandoned Application

First Claim

Patent Images

1. A system for speaker modelling, said system comprising:

a library of acoustic data relating to a plurality of background speakers, representative of a population of interest;

a library of acoustic data relating to a plurality of reference speakers, representative of a population of interest;

a database containing at least one training sequenced, said training sequence relating to one or more target speakers;

a memory for storing a background model and a speaker model for said one or more target speakers; and

at least one processor coupled to said library, database and memory, wherein said at least one processor is configured to;

estimate a background model based on a library of acoustic data from a plurality of background speakers;

train a set of Gaussian mixture models (GMMs) from a library of acoustic data from a plurality of reference speakers and the background model;

estimate a prior distribution of speaker model parameters using information from the trained set of GMMs and the background model, wherein correlation information is extracted from the trained set of GMMs;

estimate a speaker model for said one or more target speaker(s), using a GMM structure based on the maximum a posteriori (MAP) criterion; and

store said background model and said speaker model in said memory.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for speaker recognition speaker modelling whereby prior speaker information is incorporated into the modelling process, utilising the maximum a posteriori (MAP) algorithm and extending it to contain prior Gaussian component correlation information. Firstly a background model (10) is estimated. Pooled acoustic reference data (11) relating to a specific demographic of speakers (population of interest) from a given total population is then trained via the Expectation Maximization (EM) algorithm (12) to produce a background model (13). The background model (13) is adapted utilising information from a plurality of reference speakers (21) in accordance with the Maximum A Posteriori (MAP) criterion (22). Utilizing MAP estimation technique, the reference speaker data and prior information obtained from the background model parameters are combined to produce a library of adapted speaker models, namely Gaussian Mixture Models (23).

Citations

31 Claims

1. A system for speaker modelling, said system comprising:
- a library of acoustic data relating to a plurality of background speakers, representative of a population of interest;
  
  a library of acoustic data relating to a plurality of reference speakers, representative of a population of interest;
  
  a database containing at least one training sequenced, said training sequence relating to one or more target speakers;
  
  a memory for storing a background model and a speaker model for said one or more target speakers; and
  
  at least one processor coupled to said library, database and memory, wherein said at least one processor is configured to;
  
  estimate a background model based on a library of acoustic data from a plurality of background speakers;
  
  train a set of Gaussian mixture models (GMMs) from a library of acoustic data from a plurality of reference speakers and the background model;
  
  estimate a prior distribution of speaker model parameters using information from the trained set of GMMs and the background model, wherein correlation information is extracted from the trained set of GMMs;
  
  estimate a speaker model for said one or more target speaker(s), using a GMM structure based on the maximum a posteriori (MAP) criterion; and
  
  store said background model and said speaker model in said memory.
- View Dependent Claims (2)
- - 2. The system of claim 1 wherein the MAP criterion for the speaker model is a function of the training sequence and the estimated prior distribution.

3. A system for speaker modelling and verification, said system including:
- a library of acoustic data relating to a plurality of background speakers;
  
  a library of acoustic data relating to a plurality of reference speakers;
  
  a database containing training sequences said training sequences relating to one or more target speakers;
  
  an input for obtaining a speech sample from a speaker;
  
  a memory for storing a background model and a speaker model for said one or more target speakers; and
  
  at least one processor wherein said at least one processor is configured to;
  
  estimate a background model based on a library of acoustic data from a plurality of background speakers;
  
  train a set of Gaussian mixture models (GMMs) from a library of acoustic data from a plurality of reference speakers and the background model;
  
  estimate a prior distribution of speaker model parameters using information from the trained set of GMMs and the background model, wherein correlation information is extracted from the trained set of GMMs;
  
  estimate a speaker model for said one or more target speaker(s), using a GMM structure based on the maximum a posteriori (MAP) criterion, wherein the MAP criterion is a function of the training sequence and the estimated prior distribution;
  
  store said background model and said speaker model in said memoryobtain a speech sample from a speaker;
  
  evaluate a similarity measure between the speech sample and the target speaker model and between the speech sample and the background model;
  
  verify if the speaker is a target speaker by comparing the similarity measures between the speech sample and the target speaker model and between the speech sample and the background model; and
  
  grant access to the speaker if the speaker is verified as one of the target speakers.
- View Dependent Claims (4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 4. The system of claim 3 wherein the background model directly describes elements of the prior distribution.
  - 5. The system of claim 3 wherein the background speakers and reference speakers are representative of a particular demographic selected from a population of interest including the following:
    - persons of selected ages, genders and cultural backgrounds.
  - 6. The system of claim 3 wherein the library of acoustic data used to train the set of GMMs is independent of the library used to estimate the background model.
  - 7. The system of claim 3 wherein the extracted correlation information is stored in a library.
  - 8. The system of claim 7 wherein the library of correlation information includes estimated covariance of mixture component means extracted from the trained set of GMMs.
  - 9. The system of claim 8 wherein a prior covariance matrix of the mixture component means is compiled based on the library of correlation information.
  - 10. The system of claim 9 wherein the estimate of the prior covariance of the mixture component means is determined by one or more of the following estimation methods:
    - maximum likelihood, Bayesian inference of the correlation information using the background model covariance statistics as prior information, or reducing the off-diagonal elements.
  - 11. The system of claim 7 wherein the estimation of prior distribution of speaker model parameters is based on said library of correlation information and the background model.
  - 12. The system of claim 3 wherein the estimation of the prior distribution further includes:
    - a) re-training the library of reference speaker models using the estimate of the prior distribution;
      
      b) re-estimating the prior distribution based on the retrained library of reference speaker models; and
      
      c) repeating steps (a) and (b) until a convergence criterion is met.
  - 13. The system of claim 3 wherein the evaluation of the similarity measure utilises an expected frame-based log-likelihood ratio technique.
  - 14. The system of claim 3 wherein the step of verification and identification further includes the use of post-processing techniques to mitigate speech channel effects selected from the following:
    - feature warping, feature mean and variance normalisation, relative spectral techniques (RASTA), modulation spectrum processing and Cepstral Mean Subtraction.
  - 15. The system of claim 3 wherein the speech sample from the speaker is provided to said input via a communications network.
  - 16. The system of claim 3 wherein the system further utilises full target and background model coupling.

17. A method of speaker modelling, said method comprising the steps of:
- estimating a background model based on a library of acoustic data from a plurality of speakers;
  
  training a set of Gaussian mixture models (GMMs) from constraints provided by a library of acoustic data from a plurality of speakers and the background model;
  
  estimating a prior distribution of speaker model parameters using information from the trained set of GMMs and the background model, wherein correlation information is extracted from the trained set of GMMs;
  
  obtaining a training sequence from at least one target speaker;
  
  estimating a speaker model for each of the target speakers using a GMM structure based on the maximum a posteriori (MAP) criterion, wherein the MAP criterion is a function of the training sequence and the estimated prior distribution.
- View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 30, 31)
- - 19. The method of claim 17 wherein the background model directly describes elements of the prior distribution.
  - 20. The method of claim 17 wherein the speakers representative of a particular of a population of interest are selected from a particular demographic including one or more of the following:
    - persons of selected ages, genders and/or cultural backgrounds.
  - 21. The method of claim 17 wherein the library of acoustic data used to train the set of GMMs is independent of the acoustic data from said speakers representative of a population of interest used to estimate the background model.
  - 22. The method of claim 17 wherein the step of extracting the correlation information includes extracting the covariance of the mixture component means from the trained set of GMMs.
  - 23. The method of claim 22 further including the step of storing the extracted correlation information in a library.
  - 24. The method of claim 23 further including the step of estimating a prior covariance matrix of mixture component means based on the library of correlation information.
  - 25. The method of claim 24 further including the step of estimating the prior covariance of the mixture component means is determined by an estimation techniques chosen from:
    - maximum likelihood, Bayesian inference of the correlation information using the background model covariance statistics as prior information, and reducing the off-diagonal elements.
  - 26. The method of claim 23 wherein the estimation of the prior distribution of speaker model parameters is based on said library of correlation information and the background model.
  - 27. The method of claim 17 wherein the step of estimating the prior distribution further includes the steps of:
    - a) re-training the library of acoustic data from a plurality of speakers using the estimate of the prior distribution;
      
      b) re-estimating the prior distribution based on the retrained library of acoustic data from the plurality of speakers; and
      
      c) repeating steps (a) and (b) until a convergence criterion is met.
  - 30. The method of claim 17 wherein the testing and training sequences are obtained via a communication network.
  - 31. The method of claim 17 wherein said target model and said background model are fully coupled.

18. A method of speaker recognition, said method comprising the steps of:
- estimating a background model based on a library of acoustic data from a plurality of background speakers;
  
  training a set of Gaussian mixture models (GMMs) from a library of acoustic data from a plurality of reference speakers and the background model;
  
  estimating a prior distribution of speaker model parameters using information from the trained set of GMMs and the background model, wherein correlation information is extracted from the trained set of GMMs;
  
  obtaining a training sequence from at least one target speaker;
  
  estimating a target speaker model for each of the target speakers using a GMM structure based on the maximum a posteriori (MAP) criterion, wherein the MAP criterion is a function of the training sequence and the estimated prior distribution;
  
  obtaining a speech sample from a speaker;
  
  evaluating a similarity measure between the speech sample and the target speaker model and between the speech sample and the background model; and
  
  identifying whether the speaker is one of said target speakers by comparing the similarity measures between the speech sample and said target speaker model and between the speech sample and the background model.
- View Dependent Claims (28, 29)
- - 28. The method of claim 18 wherein the evaluation of the similarity measure utilises an expected frame-based log-likelihood ratio technique.
  - 29. The method of claim 18 wherein the step of verification and identification further includes the use of post-processing techniques to mitigate speech channel effects selected from the following:
    - feature warping, feature mean and variance normalisation, relative spectral techniques (RASTA), modulation spectrum processing and Cepstral Mean Subtraction.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Queensland University of Technology
Original Assignee
Queensland University of Technology
Inventors
Vogt, Robert, Pelecanos, Jason, Sridharan, Subramanian

Application Number

US10/581,227
Publication Number

US 20080208581A1
Time in Patent Office

Days
Field of Search
US Class Current

704/250
CPC Class Codes

G10L 17/04 Training, enrolment or mode...

Model Adaptation System and Method for Speaker Recognition

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

31 Claims

Specification

Solutions

Use Cases

Quick Links

Model Adaptation System and Method for Speaker Recognition

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

31 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links