Subword-based speaker verification using multiple-classifier fusion, with channel, fusion, model and threshold adaptation
First Claim
1. An automatic speaker verification system comprising:
- a receiver, the receiver obtaining enrollment speech over an enrollment channel;
a means, connected to the receiver, for developing an estimate of the enrollment channel;
a first storage device, connected to the receiver, for storing the enrollment channel estimate;
a means for extracting predetermined features of the enrollment speech;
a means, operably connected to the extracting means, for segmenting the predetermined features of the enrollment speech, wherein the features are segmented into a plurality of subwords using automatic blind speech segmentation; and
at least one classifier, connected to the segmenting means, wherein the classifier models the plurality of subwords and outputs one or more classifier score.
2 Assignments
0 Petitions
Accused Products
Abstract
The voice print system of the present invention is a subword-based, text-dependent automatic speaker verification system that embodies the capability of user-selectable passwords with no constraints on the choice of vocabulary words or the language. An estimate of the enrollement channel and of the test channel is developed for inverse filtering of the enrollment or the test speech, respectively. Automatic blind speech segmentation allows speech to be segmented into subword units without any linguistic knowledge of the password. Subword modeling is performed using a multiple classifiers. The system also takes advantage of such concepts as multiple classifier fusion and data resampling to successfully boost the performance. Key word/key phrase spotting is used to optimally locate the password phrase. Numerous adaptation techniques increase the flexibility of the base system, and include: channel adaptation, fusion adaptation, model adaptation and threshold adaptation.
-
Citations
25 Claims
-
1. An automatic speaker verification system comprising:
-
a receiver, the receiver obtaining enrollment speech over an enrollment channel;
a means, connected to the receiver, for developing an estimate of the enrollment channel;
a first storage device, connected to the receiver, for storing the enrollment channel estimate;
a means for extracting predetermined features of the enrollment speech;
a means, operably connected to the extracting means, for segmenting the predetermined features of the enrollment speech, wherein the features are segmented into a plurality of subwords using automatic blind speech segmentation; and
at least one classifier, connected to the segmenting means, wherein the classifier models the plurality of subwords and outputs one or more classifier score. - View Dependent Claims (2, 3, 4, 5, 6, 7)
an analog to digital converter, connected to the receiver, for providing the obtained enrollment speech in a digital format.
-
-
3. The automatic speaker verification system of claim 1, wherein at least one classifier is a one neural tree network classifier.
-
4. The automatic speaker verification system of claim 1, wherein at least one classifier is a Gaussian mixture model classifier.
-
5. The automatic speaker verification system of claim 1, wherein the classifiers comprise:
-
at least one Gaussian mixture model classifier, the Gaussian mixture model classifer resulting in a first classifier score; and
at least one neural tree network classifier, the neural tree network classifer resulting in a second classifier score.
-
-
6. The automatic speaker verification system of claim 1, further comprising a means, connected to the classifier, for fusing the classifier scores, wherein the fusing means weighs the scores from the classifier models with a fusion constant and combines the weighted scores resulting in a final score for the combined system.
-
7. The automatic speaker verification system of claim 1, further comprising a second storage device, connected to the classifier, for storing the one or more classifier scores.
-
8. An automatic speaker verification system comprising:
-
a receiver, the receiver obtaining enrollment speech over an enrollment channel;
a means, connected to the receiver, for developing an estimate of the enrollment channel wherein said estimating means comprises a means for creating a filter representing characteristics of the enrollment channel, by dissecting the speech into its individual frequency components, selecting those components whose bandwidths are larger than a preset threshold to be those contributed by the channel, and then recombining those components that are contributed by the channel to create a channel estimate;
a first storage device, connected to the receiver, for storing the enrollment channel estimate;
a means for extracting predetermined features of the enrollment speech;
a means, operably connected to the extracting means, for segmenting the predetermined features of the enrollment speech, wherein the features are segmented into a plurality of subwords; and
at least one classifier, connected to the segmenting means, wherein the classifier models the plurality of subwords and outputs one or more classifier scores.
-
-
9. An automatic speaker verification method, comprising:
-
obtaining enrollment speech over an enrollment channel;
storing an estimate of the enrollment channel;
extracting predetermined features of the enrollment speech;
segmenting the enrollment speech, wherein the enrollment speech is segmented into a plurality of subwords using automatic blind speech segmentation; and
modeling the plurality of subwords using one or more classifier models resulting in an output of one of more classifier scores. - View Dependent Claims (10, 11, 12, 13, 14)
digitizing the obtained enrollment speech; and
preprocessing the digitized enrollment speech.
-
-
11. The automatic speaker verification method of claim 9, wherein the step of modeling comprises the step of scoring at least one neural tree network classifier.
-
12. The automatic speaker verification method of claim 9, wherein the step of modeling further comprises the steps of:
-
scoring at least one Gaussian mixture model classifier, the Gaussian mixture model classifer resulting in a first classifier score;
scoring at least one neural tree network classifier, the Gaussian mixture model classifer resulting in a second classifier score;
fusing the first and second classifier scores.
-
-
13. The automatic speaker verification method of claim 9, further comprising the steps of:
weighing the scores from the classifier models with a fusion constant; and
combining the weighted scores resulting in a final score for the combined system.
-
14. The automatic speaker verification method of claim 9, wherein the step of storing an estimate of the enrollment channel comprises the step of creating a filter representing characteristics of the enrollment channel.
-
15. An automatic speaker verification method, comprising:
-
obtaining enrollment speech over an enrollment channel;
creating an estimate of the enrollment channel wherein the estimate of the enrollment channel comprises the steps of dissecting the speech into its individual frequency components, selecting those individual frequency components whose bandwidths are larger than a preset threshold to be those components that are contributed by the channel, and then recombining those components that are contributed by the channel to create the enrollment channel estimate;
inverse filtering the enrollment speech to create inverse filtered enrollment speech;
receiving test speech over a testing channel;
inverse filtering the test speech to create inverse filtered test speech; and
determining whether the inverse filtered test speech comes from the same person as the inverse filtered enrollment speech. - View Dependent Claims (16, 17)
-
-
18. An automatic speaker verification method, including the steps of:
-
obtaining two or more samples of enrollment speech;
processing each sample of enrollment speech to form corresponding utterances;
obtaining test speech;
identifying one or more key words/key phrases in the test speech, including the steps of;
selecting a reference utterance from one of the utterances;
warping the remaining samples of the enrollment speech to the reference utterance;
averaging one or more of the warped utterances to generate a reference template;
calculating a dynamic time warp distortion for the reference template and test speech; and
choosing a portion of the test utterance which has the least dynamic time warp distortion; and
comparing the identified key word/key phrases to the enrollment speech to determine whether the test speech and enrollment speech are from the same person. - View Dependent Claims (19, 20, 21, 22)
-
-
23. An automatic speaker verification method, comprising the steps of:
-
obtaining test speech from a user over a test channel;
processing the test speech to remove the effects of the test channel; and
comparing the processed test speech with speech data from a known user, including the steps of;
extracting features of the test speech;
generating subwords based on the extracted features;
scoring the subwords using one or more model classifiers;
fusing the results of the model classifiers to obtain a final score; and
verifying the user if the final score is equal to or greater than a threshold value. - View Dependent Claims (24, 25)
training the model classifiers using antispeaker data from nonusers and one or more enrollment speech samples from the user;
changing the model classifiers and threshold value, including the step of;
determining that the user has been verified;
retraining the model classifiers, including the step of using test speech corresponding the verified final score as an enrollment sample;
calculating a new threshold value based on the retrained model classifiers.
-
Specification