Voice print system and method
First Claim
1. An automatic speaker verification system comprising:
- a receiver, the receiver obtaining enrollment speech over an enrollment channel;
a means, connected to the receiver, for developing an estimate of the enrollment channel;
a first storage device, connected to the receiver, for storing the enrollment channel estimate;
a means for extracting predetermined features of the enrollment speech;
a means, operably connected to the extracting means, for segmenting the predetermined features of the enrollment speech, wherein the features are segmented into a plurality of subwords;
at least one classifier, connected to the segmenting means, wherein the classifier models the pluraility of subwords and outputs one or more classifier scores.
2 Assignments
0 Petitions
Accused Products
Abstract
The voice print system of the present invention is a subword-based, text-dependent automatic speaker verification system that embodies the capability of user-selectable passwords with no constraints on the choice of vocabulary words or the language. Automatic blind speech segmentation allows speech to be segmented into subword units without any linguistic knowledge of the password. Subword modeling is performed using a multiple classifiers. The system also takes advantage of such concepts as multiple classifier fusion and data resampling to successfully boost the performance. Key word/key phrase spotting is used to optimally locate the password phrase. Numerous adaptation techniques increase the flexibility of the base system, and include: channel adaptation, fusion adaptation, model adaptation and threshold adaptation.
150 Citations
36 Claims
-
1. An automatic speaker verification system comprising:
-
a receiver, the receiver obtaining enrollment speech over an enrollment channel;
a means, connected to the receiver, for developing an estimate of the enrollment channel;
a first storage device, connected to the receiver, for storing the enrollment channel estimate;
a means for extracting predetermined features of the enrollment speech;
a means, operably connected to the extracting means, for segmenting the predetermined features of the enrollment speech, wherein the features are segmented into a plurality of subwords;
at least one classifier, connected to the segmenting means, wherein the classifier models the pluraility of subwords and outputs one or more classifier scores. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. An automatic speaker verification method, comprising the steps of:
-
obtaining enrollment speech over an enrollment channel;
storing an estimate of the enrollment channel;
extracting predetermined features of the enrollment speech;
segmenting the enrollment speech, wherein the enrollment speech is segmented into a plurality of subwords; and
modelling the pluraility of subwords using one or more classifier models resulting in an output of one of more classifier scores. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 20, 21)
-
-
19. An automatic speaker verification method, comprising the steps of:
-
obtaining enrollment speech over an enrollment channel;
storing an estimate of the enrollment channel, the estimate being a filter representing characteristics of the enrollment channel;
receiving test speech over a testing channel;
inverse filtering the test speech to create filtered test speech;
recalling the estimate of the enrollment channel filtering the filtered test speech through the recalled estimate of the enrollment channel to create enrollment filtered test speech; and
determining whether the enrollment filtered test speech comes from the same person as the enrollment speech.
-
-
22. An automatic speaker verification method, comprising the steps of:
-
obtaining enrollment speech over an enrollment channel;
inverse filtering the enrollment speech to create inverse filtered enrollment speech;
receiving test speech over a testing channel;
inverse filtering the test speech to create inverse filtered test speech; and
determining whether the inverse filtered test speech comes from the same person as the inverse filtered enrollment speech. - View Dependent Claims (23, 24, 26, 27, 28, 29)
-
-
25. An automatic speaker verification method, including the steps of:
-
obtaining two or more samples of enrollment speech;
processing each sample of enrollment speech to form corresponding utterances;
obtaining test speech;
identifying one or more key words/key phrases in the test speech, including the steps of;
selecting a reference utterance from one of the utterances;
warping the remaining samples of the enrollment speech to the reference utterance;
averaging one or more of the warped utterances to generate a reference template;
calculating a dynamic time warp distortion for the reference template and test speech; and
choosing a portion of the test utterance which has the least dynamic time warp distortion; and
comparing the identified key word/key phrases to the enrollment speech to determine whether the test speech and enrollment speech are from the same person.
-
-
30. An automatic speaker verification method, wherein the results of prior verifications are stored, including the steps of:
-
obtaining test speech from a user seeking authorization or identification;
generating subwords of the test speech;
scoring the subwords against subwords of a known individual using a plurality of modeling classifiers;
storing the results of each model classifiers as a classifier score;
fusing the results of each classifier score using a fusion constant and weighing function to generate a final score; and
comparing final score to a threshold value to determine whether the test speech and enrollment speech are from the known individual. - View Dependent Claims (31, 32, 33, 35, 36)
-
-
34. An automatic speaker verification method, comprising the steps of:
-
obtaining test speech from a user over a test channel;
processing the test speech to remove the effects of the test channel; and
comparing the processed test speech with speech data from a known user, including the steps of;
extracting features of the test speech;
generating subwords based on the extracted features;
scoring the subwords using one or more model classifiers;
fusing the results of the model classifiers to obtain a final score; and
verifying the user if the final score is equal to or greater than a threshold value.
-
Specification