Subword-based speaker verification using multiple-classifier fusion, with channel, fusion, model and threshold adaptation

US 6,760,701 B2
Filed: 01/08/2002
Issued: 07/06/2004
Est. Priority Date: 11/22/1996
Status: Expired due to Term

First Claim

Patent Images

1. An automatic speaker verification system comprising:

a receiver, the receiver obtaining enrollment speech over an enrollment channel;

a means, connected to the receiver, for developing an estimate of the enrollment channel;

a first storage device, connected to the receiver, for storing the enrollment channel estimate;

a means for extracting predetermined features of the enrollment speech;

a means, operably connected to the extracting means, for segmenting the predetermined features of the enrollment speech, wherein the features are segmented into a plurality of subwords using automatic blind speech segmentation; and

at least one classifier, connected to the segmenting means, wherein the classifier models the plurality of subwords and outputs one or more classifier score.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The voice print system of the present invention is a subword-based, text-dependent automatic speaker verification system that embodies the capability of user-selectable passwords with no constraints on the choice of vocabulary words or the language. An estimate of the enrollement channel and of the test channel is developed for inverse filtering of the enrollment or the test speech, respectively. Automatic blind speech segmentation allows speech to be segmented into subword units without any linguistic knowledge of the password. Subword modeling is performed using a multiple classifiers. The system also takes advantage of such concepts as multiple classifier fusion and data resampling to successfully boost the performance. Key word/key phrase spotting is used to optimally locate the password phrase. Numerous adaptation techniques increase the flexibility of the base system, and include: channel adaptation, fusion adaptation, model adaptation and threshold adaptation.

Citations

25 Claims

1. An automatic speaker verification system comprising:
- a receiver, the receiver obtaining enrollment speech over an enrollment channel;
  
  a means, connected to the receiver, for developing an estimate of the enrollment channel;
  
  a first storage device, connected to the receiver, for storing the enrollment channel estimate;
  
  a means for extracting predetermined features of the enrollment speech;
  
  a means, operably connected to the extracting means, for segmenting the predetermined features of the enrollment speech, wherein the features are segmented into a plurality of subwords using automatic blind speech segmentation; and
  
  at least one classifier, connected to the segmenting means, wherein the classifier models the plurality of subwords and outputs one or more classifier score.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The automatic speaker verification system of claim 1, further comprising:
3. The automatic speaker verification system of claim 1, wherein at least one classifier is a one neural tree network classifier.
4. The automatic speaker verification system of claim 1, wherein at least one classifier is a Gaussian mixture model classifier.
5. The automatic speaker verification system of claim 1, wherein the classifiers comprise:
- at least one Gaussian mixture model classifier, the Gaussian mixture model classifer resulting in a first classifier score; and
  
  at least one neural tree network classifier, the neural tree network classifer resulting in a second classifier score.
6. The automatic speaker verification system of claim 1, further comprising a means, connected to the classifier, for fusing the classifier scores, wherein the fusing means weighs the scores from the classifier models with a fusion constant and combines the weighted scores resulting in a final score for the combined system.
7. The automatic speaker verification system of claim 1, further comprising a second storage device, connected to the classifier, for storing the one or more classifier scores.

8. An automatic speaker verification system comprising:
- a receiver, the receiver obtaining enrollment speech over an enrollment channel;
  
  a means, connected to the receiver, for developing an estimate of the enrollment channel wherein said estimating means comprises a means for creating a filter representing characteristics of the enrollment channel, by dissecting the speech into its individual frequency components, selecting those components whose bandwidths are larger than a preset threshold to be those contributed by the channel, and then recombining those components that are contributed by the channel to create a channel estimate;
  
  a first storage device, connected to the receiver, for storing the enrollment channel estimate;
  
  a means for extracting predetermined features of the enrollment speech;
  
  a means, operably connected to the extracting means, for segmenting the predetermined features of the enrollment speech, wherein the features are segmented into a plurality of subwords; and
  
  at least one classifier, connected to the segmenting means, wherein the classifier models the plurality of subwords and outputs one or more classifier scores.

9. An automatic speaker verification method, comprising:
- obtaining enrollment speech over an enrollment channel;
  
  storing an estimate of the enrollment channel;
  
  extracting predetermined features of the enrollment speech;
  
  segmenting the enrollment speech, wherein the enrollment speech is segmented into a plurality of subwords using automatic blind speech segmentation; and
  
  modeling the plurality of subwords using one or more classifier models resulting in an output of one of more classifier scores.
- View Dependent Claims (10, 11, 12, 13, 14)
- - 10. The automatic speaker verification method of claim 9, further comprising the steps of:
11. The automatic speaker verification method of claim 9, wherein the step of modeling comprises the step of scoring at least one neural tree network classifier.
12. The automatic speaker verification method of claim 9, wherein the step of modeling further comprises the steps of:
- scoring at least one Gaussian mixture model classifier, the Gaussian mixture model classifer resulting in a first classifier score;
  
  scoring at least one neural tree network classifier, the Gaussian mixture model classifer resulting in a second classifier score;
  
  fusing the first and second classifier scores.
13. The automatic speaker verification method of claim 9, further comprising the steps of:
- weighing the scores from the classifier models with a fusion constant; and
  
  combining the weighted scores resulting in a final score for the combined system.
14. The automatic speaker verification method of claim 9, wherein the step of storing an estimate of the enrollment channel comprises the step of creating a filter representing characteristics of the enrollment channel.

15. An automatic speaker verification method, comprising:
- obtaining enrollment speech over an enrollment channel;
  
  creating an estimate of the enrollment channel wherein the estimate of the enrollment channel comprises the steps of dissecting the speech into its individual frequency components, selecting those individual frequency components whose bandwidths are larger than a preset threshold to be those components that are contributed by the channel, and then recombining those components that are contributed by the channel to create the enrollment channel estimate;
  
  inverse filtering the enrollment speech to create inverse filtered enrollment speech;
  
  receiving test speech over a testing channel;
  
  inverse filtering the test speech to create inverse filtered test speech; and
  
  determining whether the inverse filtered test speech comes from the same person as the inverse filtered enrollment speech.
- View Dependent Claims (16, 17)
- - 16. The automatic speaker verification method of claim 15, wherein the step of inverse filtering the enrollment speech comprises the step of creating a filter representing inverse characteristics of the enrollment channel.
  - 17. The automatic speaker verification method of claim 15, wherein the step of inverse filtering the test speech comprises the step of creating a filter representing inverse characteristics of the testing channel.

18. An automatic speaker verification method, including the steps of:
- obtaining two or more samples of enrollment speech;
  
  processing each sample of enrollment speech to form corresponding utterances;
  
  obtaining test speech;
  
  identifying one or more key words/key phrases in the test speech, including the steps of;
  
  selecting a reference utterance from one of the utterances;
  
  warping the remaining samples of the enrollment speech to the reference utterance;
  
  averaging one or more of the warped utterances to generate a reference template;
  
  calculating a dynamic time warp distortion for the reference template and test speech; and
  
  choosing a portion of the test utterance which has the least dynamic time warp distortion; and
  
  comparing the identified key word/key phrases to the enrollment speech to determine whether the test speech and enrollment speech are from the same person.
- View Dependent Claims (19, 20, 21, 22)
- - 19. The automatic speaker verification method of claim 18, wherein the step of selecting a reference utterance comprises the step of:
    - choosing the utterance with minimum duration.
  - 20. The automatic speaker verification method of claim 18, wherein the step of selecting a reference utterance comprises the step of:
    - choosing an utterance with median duration.
  - 21. The automatic speaker verification method of claim 18, wherein the step of selecting a reference utterance comprises the step of:
    - choosing an utterance with a duration closest to the average duration.
  - 22. The automatic speaker verification method of claim 18, wherein the step of selecting a reference utterance comprises the step of:
    - choosing an utterance with minimum combined distortion with respect to the other utterances.

23. An automatic speaker verification method, comprising the steps of:
- obtaining test speech from a user over a test channel;
  
  processing the test speech to remove the effects of the test channel; and
  
  comparing the processed test speech with speech data from a known user, including the steps of;
  
  extracting features of the test speech;
  
  generating subwords based on the extracted features;
  
  scoring the subwords using one or more model classifiers;
  
  fusing the results of the model classifiers to obtain a final score; and
  
  verifying the user if the final score is equal to or greater than a threshold value.
- View Dependent Claims (24, 25)
- - 24. The automatic speaker verification method of claim 23, wherein the known speech is obtained over an enrollment channel, wherein the step of processing further comprises the step of filtering the test speech through a filter having characteristics of the enrollment channel, and wherein the step of generating subwords further comprises the step of spotting one or more key words/key phrases in the processed test speech.
  - 25. The automatic speaker verification method of claim 23, further comprising the steps of:

Specification

Resources

Litigation Campaign Assessment

Current Assignee
SpeechWorks International, Inc. (Microsoft Corporation)
Original Assignee
T-Netix, Inc. (Cognizant Technology Solutions Corp.)
Inventors
Sharma, Manish, Zhang, Xiaoyu, Mammone, Richard J.
Primary Examiner(s)
SMITS, TALIVALDIS IVARS

Application Number

US10/042,832
Publication Number

US 20030009333A1
Time in Patent Office

910 Days
Field of Search

704/246, 704/249, 704/234
US Class Current

704/249
CPC Class Codes

G10L 15/04   Segmentation; Word boundary...

G10L 15/07   to the speaker

G10L 15/10   using distance or distortio...

G10L 15/16   using artificial neural net...

G10L 15/1815   Semantic context, e.g. disa...

G10L 17/04   Training, enrolment or mode...

G10L 17/10   Multimodal systems, i.e. ba...

Subword-based speaker verification using multiple-classifier fusion, with channel, fusion, model and threshold adaptation

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Subword-based speaker verification using multiple-classifier fusion, with channel, fusion, model and threshold adaptation

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links