Voice print system and method

US 20030009333A1
Filed: 01/08/2002
Published: 01/09/2003
Est. Priority Date: 11/22/1996
Status: Active Grant

First Claim

Patent Images

1. An automatic speaker verification system comprising:

a receiver, the receiver obtaining enrollment speech over an enrollment channel;

a means, connected to the receiver, for developing an estimate of the enrollment channel;

a first storage device, connected to the receiver, for storing the enrollment channel estimate;

a means for extracting predetermined features of the enrollment speech;

a means, operably connected to the extracting means, for segmenting the predetermined features of the enrollment speech, wherein the features are segmented into a plurality of subwords;

at least one classifier, connected to the segmenting means, wherein the classifier models the pluraility of subwords and outputs one or more classifier scores.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The voice print system of the present invention is a subword-based, text-dependent automatic speaker verification system that embodies the capability of user-selectable passwords with no constraints on the choice of vocabulary words or the language. Automatic blind speech segmentation allows speech to be segmented into subword units without any linguistic knowledge of the password. Subword modeling is performed using a multiple classifiers. The system also takes advantage of such concepts as multiple classifier fusion and data resampling to successfully boost the performance. Key word/key phrase spotting is used to optimally locate the password phrase. Numerous adaptation techniques increase the flexibility of the base system, and include: channel adaptation, fusion adaptation, model adaptation and threshold adaptation.

150 Citations

36 Claims

1. An automatic speaker verification system comprising:
- a receiver, the receiver obtaining enrollment speech over an enrollment channel;
  
  a means, connected to the receiver, for developing an estimate of the enrollment channel;
  
  a first storage device, connected to the receiver, for storing the enrollment channel estimate;
  
  a means for extracting predetermined features of the enrollment speech;
  
  a means, operably connected to the extracting means, for segmenting the predetermined features of the enrollment speech, wherein the features are segmented into a plurality of subwords;
  
  at least one classifier, connected to the segmenting means, wherein the classifier models the pluraility of subwords and outputs one or more classifier scores.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The automatic speaker verification system of claim 1, further comprising:
    - an analog to digital converter, connected to the receiver, for providing the obtained enrollment speech in a digital format.
  - 3. The automatic speaker verification system of claim 1, wherein at least one classifier is a one neural tree network classifier.
  - 4. The automatic speaker verification system of claim 1, wherein at least one classifier is a Gaussian mixture model classifier.
  - 5. The automatic speaker verification system of claim 1, wherein the classifiers comprise:
    - at least one Gaussian mixture model classifier, the Gaussian mixture model classifer resulting in a first classifier score; and
      
      at least one neural tree network classifier, the neural tree network classifer resulting in a second classifier score.
  - 6. The automatic speaker verification system of claim 1, further comprising a means, connected to the classifier, for fusing the classifier scores, wherein the fusing means weighs the scores from the classifier models with a fusion constant and combines the weighted scores resulting in a final score for the combined system.
  - 7. The automatic speaker verification system of claim 6, wherein the weighted scores are variable and are dynamically adapted.
  - 8. The automatic speaker verification system of claim 1, wherein the segmenting means generates subwords using automatic blind speech segmentation.
  - 9. The automatic speaker verification system of claim 1, wherein the estimating means comprises a means for creating a filter representing characteristics of the enrollment channel.
  - 10. The automatic speaker verification system of claim 1, further comprising a second storage device, connected to the classifier, for storing the one or more classifier scores.

11. An automatic speaker verification method, comprising the steps of:
- obtaining enrollment speech over an enrollment channel;
  
  storing an estimate of the enrollment channel;
  
  extracting predetermined features of the enrollment speech;
  
  segmenting the enrollment speech, wherein the enrollment speech is segmented into a plurality of subwords; and
  
  modelling the pluraility of subwords using one or more classifier models resulting in an output of one of more classifier scores.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 20, 21)
- - 12. The automatic speaker verification method of claim 11, further comprising the steps of:
    - digitizing the obtained enrollment speech; and
      
      preprocessing the digitized enrollment speech.
  - 13. The automatic speaker verification method of claim 11, wherein the step of modeling comprises the step of scoring at least one neural tree network classifier.
  - 14. The automatic speaker verification method of claim 11, wherein the step of modeling further comprises the steps of:
    - scoring at least one Gaussian mixture model classifier, the Gaussian mixture model classifer resulting in a first classifier score;
      
      scoring at least one neural tree network classifier, the Gaussian mixture model classifer resulting in a second classifier score;
      
      fusing the first and second classifier scores.
  - 15. The automatic speaker verification method of claim 11, further comprising the steps of:
    - weighing the scores from the classifier models with a fusion constant; and
      
      combining the weighted scores resulting in a final score for the combined system.
  - 16. The automatic speaker verification method of claim 15, wherein the fusion constant is variable and is dynamically adapted.
  - 17. The automatic speaker verification method of claim 11, wherein the step of segmenting comprises generating subwords using automatic blind speech segmentation.
  - 18. The automatic speaker verification method of claim 11, wherein the step of storing an estimate of the enrollment channel comprises the step of creating a filter representing characteristics of the enrollment channel.
  - 20. The automatic speaker verification method of claim 19, wherein the step of storing an estimate of the enrollment channel comprises the step of creating a filter representing characteristics of the enrollment channel.
  - 21. The automatic speaker verification method of claim 19, wherein the step of inverse filtering the test speech comprises the step of creating a filter representing inverse characteristics of the testing channel.

19. An automatic speaker verification method, comprising the steps of:
- obtaining enrollment speech over an enrollment channel;
  
  storing an estimate of the enrollment channel, the estimate being a filter representing characteristics of the enrollment channel;
  
  receiving test speech over a testing channel;
  
  inverse filtering the test speech to create filtered test speech;
  
  recalling the estimate of the enrollment channel filtering the filtered test speech through the recalled estimate of the enrollment channel to create enrollment filtered test speech; and
  
  determining whether the enrollment filtered test speech comes from the same person as the enrollment speech.

22. An automatic speaker verification method, comprising the steps of:
- obtaining enrollment speech over an enrollment channel;
  
  inverse filtering the enrollment speech to create inverse filtered enrollment speech;
  
  receiving test speech over a testing channel;
  
  inverse filtering the test speech to create inverse filtered test speech; and
  
  determining whether the inverse filtered test speech comes from the same person as the inverse filtered enrollment speech.
- View Dependent Claims (23, 24, 26, 27, 28, 29)
- - 23. The automatic speaker verification method of claim 22, wherein the step of inverse filtering the enrollment speech comprises the step of creating a filter representing inverse characteristics of the enrollment channel.
  - 24. The automatic speaker verification method of claim 22, wherein the step of inverse filtering the test speech comprises the step of creating a filter representing inverse characteristics of the testing channel.
  - 26. The automatic speaker verification method of claim 25, wherein the step of selecting a reference utterance comprises the step of:
    - choosing the utterance with minimum duration.
  - 27. The automatic speaker verification method of claim 25, wherein the step of selecting a reference utterance comprises the step of:
    - choosing an utterance with median duration.
  - 28. The automatic speaker verification method of claim 25, wherein the step of selecting a reference utterance comprises the step of:
    - choosing an utterance with a duration closest to the average duration.
  - 29. The automatic speaker verification method of claim 25, wherein the step of selecting a reference utterance comprises the step of:
    - choosing an utterance with minimum combined distortion with respect to the other utterances.

25. An automatic speaker verification method, including the steps of:
- obtaining two or more samples of enrollment speech;
  
  processing each sample of enrollment speech to form corresponding utterances;
  
  obtaining test speech;
  
  identifying one or more key words/key phrases in the test speech, including the steps of;
  
  selecting a reference utterance from one of the utterances;
  
  warping the remaining samples of the enrollment speech to the reference utterance;
  
  averaging one or more of the warped utterances to generate a reference template;
  
  calculating a dynamic time warp distortion for the reference template and test speech; and
  
  choosing a portion of the test utterance which has the least dynamic time warp distortion; and
  
  comparing the identified key word/key phrases to the enrollment speech to determine whether the test speech and enrollment speech are from the same person.

30. An automatic speaker verification method, wherein the results of prior verifications are stored, including the steps of:
- obtaining test speech from a user seeking authorization or identification;
  
  generating subwords of the test speech;
  
  scoring the subwords against subwords of a known individual using a plurality of modeling classifiers;
  
  storing the results of each model classifiers as a classifier score;
  
  fusing the results of each classifier score using a fusion constant and weighing function to generate a final score; and
  
  comparing final score to a threshold value to determine whether the test speech and enrollment speech are from the known individual.
- View Dependent Claims (31, 32, 33, 35, 36)
- - 31. The automatic speaker verification method of claim 30, further comprising the step of:
    - determining that fusion adaptation inclusion criteria are met; and
      
      changing the fusion constant to provide more weight to the classifier score which more accurately corresponds to the threshold value.
  - 32. The automatic speaker verification method of claim 30, further comprising the steps of:
    - determining that model adaptation inclusion criteria are met, including that one or more verifications have been successful; and
      
      training the model classifiers with previously stored enrollment speech and with speech corresponding to the successful verifications, including the steps of generating a new threshold value; and
      
      storing the new threshold value.
  - 33. The automatic speaker verification method of claim 30, further comprising the steps of:
    - determining that threshold adaptation inclusion criteria are met;
      
      analyzing the stored final scores;
      
      calculating a new threshold value in response to the analyzation; and
      
      storing the new threshold value.
  - 35. The automatic speaker verification method of claim 34, wherein the known speech is obtained over an enrollment channel, wherein the step of processing further comprises the step of filtering the test speech through a filter having characteristics of the enrollment channel, and wherein the step of generating subwords further comprises the step of spotting one or more key words/key phrases in the processed test speech.
  - 36. The automatic speaker verification method of claim 34, further comprising the steps of:
    - training the model classifiers using antispeaker data from nonusers and one or more enrollment speech samples from the user;
      
      changing the model classifiers and threshold value, including the step of;
      
      determining that the user has been verified;
      
      retraining the model classifiers, including the step of using test speech corresponding the verified final score as an enrollment sample;
      
      calculating a new threshold value based on the retrained model classifiers.

34. An automatic speaker verification method, comprising the steps of:
- obtaining test speech from a user over a test channel;
  
  processing the test speech to remove the effects of the test channel; and
  
  comparing the processed test speech with speech data from a known user, including the steps of;
  
  extracting features of the test speech;
  
  generating subwords based on the extracted features;
  
  scoring the subwords using one or more model classifiers;
  
  fusing the results of the model classifiers to obtain a final score; and
  
  verifying the user if the final score is equal to or greater than a threshold value.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
SpeechWorks International, Inc. (Microsoft Corporation)
Original Assignee
T-Netix, Inc. (Cognizant Technology Solutions Corp.)
Inventors
Sharma, Manish, Zhang, Xiaoyu, Mammone, Richard J.

Granted Patent

US 6,760,701 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/246
CPC Class Codes

G10L 15/04   Segmentation; Word boundary...

G10L 15/07   to the speaker

G10L 15/10   using distance or distortio...

G10L 15/16   using artificial neural net...

G10L 15/1815   Semantic context, e.g. disa...

G10L 17/04   Training, enrolment or mode...

G10L 17/10   Multimodal systems, i.e. ba...

Voice print system and method

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

150 Citations

36 Claims

Specification

Solutions

Use Cases

Quick Links

Voice print system and method

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

150 Citations

36 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links