Method and apparatus for multi-environment speaker verification
First Claim
1. A computer-implemented method, comprising:
- obtaining training data from each of a plurality T of sources constituting an enrolled population, over a plurality M of channels;
developing models for each of said T sources based on said training data, each model containing a collection of distributions;
generating a hierarchical model tree based on said models of said I sources, wherein at least some merged models within layers of said hierarchical model tree are computed via partitioning or grouping with respect to channel properties; and
obtaining training data from a new source over a new channel for addition to said enrolled population, developing a new model based thereupon and updating said hierarchical model tree with said new model.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for unsupervised environmental normalization for speaker verification using hierarchical clustering is disclosed. Training data (speech samples) are taken from T enrolled (registered) speakers over any one of M channels, e.g., different microphones, communication links, etc. For each speaker, a speaker model is generated, each containing a collection of distributions of audio feature data derived from the speech sample of that speaker. A hierarchical speaker model tree is created, e.g., by merging similar speaker models on a layer by layer basis. Each speaker is also grouped into a cohort of similar speakers. For each cohort, one or more complementary speaker models are generated by merging speaker models outside that cohort. When training data from a new speaker to be enrolled is received over a new channel, the speaker model tree as well as the complementary models are updated. Consequently, adaptation to data from new environments is possible by incorporating such data into the verification model whenever it is encountered.
71 Citations
19 Claims
-
1. A computer-implemented method, comprising:
-
obtaining training data from each of a plurality T of sources constituting an enrolled population, over a plurality M of channels;
developing models for each of said T sources based on said training data, each model containing a collection of distributions;
generating a hierarchical model tree based on said models of said I sources, wherein at least some merged models within layers of said hierarchical model tree are computed via partitioning or grouping with respect to channel properties; and
obtaining training data from a new source over a new channel for addition to said enrolled population, developing a new model based thereupon and updating said hierarchical model tree with said new model. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
defining a plurality of cohorts for models in the lowest layer of the tree, with each cohort being of generally equal size and containing models which are similar to one another.
-
-
5. The method of claim 4, further comprising the step of:
generating, for a particular cohort, at least one complementary model representing a merger of speaker models outside said particular cohort.
-
6. The method of claim 5, further comprising the step of updating said complementary model when a new source and corresponding model is added to said enrolled population.
-
7. The method of claim 5, wherein said at least one complementary model is a cumulative complementary model which is a model formed by merging models on multiple levels of said tree that are outside said particular cohort.
-
8. The method of claim 5, wherein said at least one model comprises a plurality of merged models, each merged model being a sibling model of an ancestor of a model within said particular cohort.
-
9. The method of claim 1 wherein:
-
each said model contains a collection of distributions of feature data associated with the corresponding source; and
said step of generating a hierarchical model tree comprises merging similar models on a layer by layer basis.
-
-
10. The method of claim 9 wherein said feature data comprises image data.
-
11. The method of claim 1 wherein said hierarchical model tree is generated using a top down technique in which a merged model of all models of the T sources is sequentially partitioned on a layer by layer basis.
-
12. The method of claim 1 wherein each said distribution is a multi-dimensional Gaussian distribution.
-
13. A speaker verification method comprising the steps of:
-
obtaining training data from each of a plurality T of sources constituting an enrolled population, over a plurality M of channels;
developing speaker models for each of said T speakers based on said training data, each model containing a collection of audio feature distributions;
generating a hierarchical speaker model tree based on said models of said T speakers, wherein at least some merged models within layers of said hierarchical speaker model tree are computed via partitioning or grouping with respect to channel properties;
receiving a claimed identification (ID) of a claimant, said claimed ID representing a speaker corresponding to a particular one of said speaker models;
determining a cohort set containing said particular speaker model and similar speaker models thereto;
receiving data corresponding to a speech sample of said claimant and generating a test speaker model therefrom; and
comparing said test model to all speaker models of said cohort set and verifying said claimant if said particular speaker is the closest model of said cohort set to said test model. - View Dependent Claims (14, 15, 16, 17)
generating a single cumulative complementary model (CCM) by merging complementary speaker models outside said cohort set; and
rejecting said claimant if said test model is closer in distance to said CCM than to said particular model.
-
-
15. The method of claim 14, wherein said complementary speaker models include a background model derived from speech data of speakers outside said tree.
-
16. The method of claim 13, further comprising the steps of:
-
generating a plurality of complementary speaker models, each being a sibling speaker model of an ancestor of said particular speaker model; and
rejecting said claimant if said test model is closer in distance to any one of said complementary speaker models than to said particular speaker model.
-
-
17. The method of claim 16, further comprising providing a background speaker model derived from speakers outside said tree, and rejecting said claimant if said test model is closer in distance to said background speaker model than to said particular speaker model.
-
18. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to provide method steps for performing pattern matching, said method comprising:
-
obtaining training data from each of a plurality T of sources constituting an enrolled population, over a plurality M of channels;
developing models for each of said T sources based on said training data, each model containing a collection of distributions;
generating a hierarchical model tree based on said models of said T sources, wherein at least some merged models within layers of said hierarchical model tree are computed via partitioning or grouping with respect to channel properties; and
obtaining training data from a new source over a new channel for addition to said enrolled population, developing a new model based thereupon and updating said hierarchical model tree with said new model. - View Dependent Claims (19)
-
Specification