Model Adaptation System and Method for Speaker Recognition
First Claim
1. A system for speaker modelling, said system comprising:
- a library of acoustic data relating to a plurality of background speakers, representative of a population of interest;
a library of acoustic data relating to a plurality of reference speakers, representative of a population of interest;
a database containing at least one training sequenced, said training sequence relating to one or more target speakers;
a memory for storing a background model and a speaker model for said one or more target speakers; and
at least one processor coupled to said library, database and memory, wherein said at least one processor is configured to;
estimate a background model based on a library of acoustic data from a plurality of background speakers;
train a set of Gaussian mixture models (GMMs) from a library of acoustic data from a plurality of reference speakers and the background model;
estimate a prior distribution of speaker model parameters using information from the trained set of GMMs and the background model, wherein correlation information is extracted from the trained set of GMMs;
estimate a speaker model for said one or more target speaker(s), using a GMM structure based on the maximum a posteriori (MAP) criterion; and
store said background model and said speaker model in said memory.
3 Assignments
0 Petitions
Accused Products
Abstract
A system and method for speaker recognition speaker modelling whereby prior speaker information is incorporated into the modelling process, utilising the maximum a posteriori (MAP) algorithm and extending it to contain prior Gaussian component correlation information. Firstly a background model (10) is estimated. Pooled acoustic reference data (11) relating to a specific demographic of speakers (population of interest) from a given total population is then trained via the Expectation Maximization (EM) algorithm (12) to produce a background model (13). The background model (13) is adapted utilising information from a plurality of reference speakers (21) in accordance with the Maximum A Posteriori (MAP) criterion (22). Utilizing MAP estimation technique, the reference speaker data and prior information obtained from the background model parameters are combined to produce a library of adapted speaker models, namely Gaussian Mixture Models (23).
-
Citations
31 Claims
-
1. A system for speaker modelling, said system comprising:
-
a library of acoustic data relating to a plurality of background speakers, representative of a population of interest; a library of acoustic data relating to a plurality of reference speakers, representative of a population of interest; a database containing at least one training sequenced, said training sequence relating to one or more target speakers; a memory for storing a background model and a speaker model for said one or more target speakers; and at least one processor coupled to said library, database and memory, wherein said at least one processor is configured to; estimate a background model based on a library of acoustic data from a plurality of background speakers; train a set of Gaussian mixture models (GMMs) from a library of acoustic data from a plurality of reference speakers and the background model; estimate a prior distribution of speaker model parameters using information from the trained set of GMMs and the background model, wherein correlation information is extracted from the trained set of GMMs; estimate a speaker model for said one or more target speaker(s), using a GMM structure based on the maximum a posteriori (MAP) criterion; and store said background model and said speaker model in said memory. - View Dependent Claims (2)
-
-
3. A system for speaker modelling and verification, said system including:
-
a library of acoustic data relating to a plurality of background speakers; a library of acoustic data relating to a plurality of reference speakers; a database containing training sequences said training sequences relating to one or more target speakers; an input for obtaining a speech sample from a speaker; a memory for storing a background model and a speaker model for said one or more target speakers; and at least one processor wherein said at least one processor is configured to; estimate a background model based on a library of acoustic data from a plurality of background speakers; train a set of Gaussian mixture models (GMMs) from a library of acoustic data from a plurality of reference speakers and the background model; estimate a prior distribution of speaker model parameters using information from the trained set of GMMs and the background model, wherein correlation information is extracted from the trained set of GMMs; estimate a speaker model for said one or more target speaker(s), using a GMM structure based on the maximum a posteriori (MAP) criterion, wherein the MAP criterion is a function of the training sequence and the estimated prior distribution; store said background model and said speaker model in said memory obtain a speech sample from a speaker; evaluate a similarity measure between the speech sample and the target speaker model and between the speech sample and the background model; verify if the speaker is a target speaker by comparing the similarity measures between the speech sample and the target speaker model and between the speech sample and the background model; and grant access to the speaker if the speaker is verified as one of the target speakers. - View Dependent Claims (4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A method of speaker modelling, said method comprising the steps of:
-
estimating a background model based on a library of acoustic data from a plurality of speakers; training a set of Gaussian mixture models (GMMs) from constraints provided by a library of acoustic data from a plurality of speakers and the background model; estimating a prior distribution of speaker model parameters using information from the trained set of GMMs and the background model, wherein correlation information is extracted from the trained set of GMMs; obtaining a training sequence from at least one target speaker; estimating a speaker model for each of the target speakers using a GMM structure based on the maximum a posteriori (MAP) criterion, wherein the MAP criterion is a function of the training sequence and the estimated prior distribution. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 30, 31)
-
-
18. A method of speaker recognition, said method comprising the steps of:
-
estimating a background model based on a library of acoustic data from a plurality of background speakers; training a set of Gaussian mixture models (GMMs) from a library of acoustic data from a plurality of reference speakers and the background model; estimating a prior distribution of speaker model parameters using information from the trained set of GMMs and the background model, wherein correlation information is extracted from the trained set of GMMs; obtaining a training sequence from at least one target speaker; estimating a target speaker model for each of the target speakers using a GMM structure based on the maximum a posteriori (MAP) criterion, wherein the MAP criterion is a function of the training sequence and the estimated prior distribution; obtaining a speech sample from a speaker; evaluating a similarity measure between the speech sample and the target speaker model and between the speech sample and the background model; and identifying whether the speaker is one of said target speakers by comparing the similarity measures between the speech sample and said target speaker model and between the speech sample and the background model. - View Dependent Claims (28, 29)
-
Specification