Model adaptation of neural tree networks and other fused models for speaker verification

US 6,519,561 B1
Filed: 11/03/1998
Issued: 02/11/2003
Est. Priority Date: 11/03/1997
Status: Expired due to Term

First Claim

Patent Images

1. An adaptable speaker verification system with model adaptation, the system comprising:

a receiver, the receiver obtaining a voice utterance;

a means, connected to the receiver, for extracting predetermined features of the voice utterance;

a means, operably connected to the extracting means, for segmenting the predetermined features of the voice utterance, wherein the features are segmented into a plurality of subwords; and

at least one adaptable model, connected to the segmenting means, wherein the model models the plurality of subwords and outputs one or more scores, and the models are updated dynamically based on the received voice utterance to incorporate the changing characteristics of a user'"'"'s voice and the adaptable models comprise at least one adaptable neural tree network model, the adaptable neural tree network model resulting in an NTN score.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The model adaptation system of the present invention is a speaker verification system that embodies the capability to adapt models learned during the enrollment component to track aging of a user'"'"'s voice. The system has the advantage of only requiring a single enrollment for the user. The model adaptation system and methods can be applied to several types of speaker recognition models including neural tree networks (NTN), Gaussian Mixture Models (GMMs), and dynamic time warping (DTW) or to multiple models (i.e., combinations of NTNs, GMMs and DTW). Moreover, the present invention can be applied to text-dependent or text-independent systems.

133 Citations

11 Claims

1. An adaptable speaker verification system with model adaptation, the system comprising:
- a receiver, the receiver obtaining a voice utterance;
  
  a means, connected to the receiver, for extracting predetermined features of the voice utterance;
  
  a means, operably connected to the extracting means, for segmenting the predetermined features of the voice utterance, wherein the features are segmented into a plurality of subwords; and
  
  at least one adaptable model, connected to the segmenting means, wherein the model models the plurality of subwords and outputs one or more scores, and the models are updated dynamically based on the received voice utterance to incorporate the changing characteristics of a user'"'"'s voice and the adaptable models comprise at least one adaptable neural tree network model, the adaptable neural tree network model resulting in an NTN score.
- View Dependent Claims (2)
- - 2. The adaptable speaker verification system of claim 1, further comprising:

3. An adaptable speaker verification system with model adaptation, the system comprising:
- a receiver, the receiver obtaining a voice utterance;
  
  a means, connected to the receiver, for extracting predetermined features of the voice utterance;
  
  a means, operably connected to the extracting means, for segmenting the predetermined features of the voice utterance, wherein the features are segmented into a plurality of subwords;
  
  at least one adaptable model, connected to the segmenting means, wherein the model models the plurality of subwords and outputs one or more scores, and the models are updated dynamically based on the received voice utterance to incorporate the changing characteristics of a user'"'"'s voice and wherein the the adaptable models comprise;
  
  at least one adaptable Gaussian mixture model, the adaptable Gaussian mixture model resulting in a GMM score; and
  
  at least one adaptable neural tree network model, the adaptable neural tree network model resulting in an NTN score.
- View Dependent Claims (4)
- - 4. The adaptable speaker verification system of claim 3, further comprising:

5. An adaptable speaker verification method, including the steps of:
- obtaining enrollment speech from a known individual;
  
  receiving test speech from a user;
  
  extracting predetermined features of the test speech;
  
  warping the predetermined features using a dynamic time warping template, wherein the dynamic warping template is adapted based on the predetermined features of the test speech, resulting in the creation of warped feature data and a dynamic time warping score from the adapted dynamic warping template;
  
  generating subwords from the warped feature data;
  
  scoring the subwords using a plurality of adaptable models, wherein the adaptable models are adapted based on the subwords derived from the test speech and wherein the scoring comprises scoring at least one adaptable neural tree network model;
  
  combining the results of each classifier score and the dynamic time warping score to generate a final score; and
  
  comparing the final score to a threshold value to determine whether the test speech and enrollment speech are from the known individual.

6. An adaptable speaker verification method, wherein at least one neural tree network model is adapted based on an adaptation utterance, comprising the following steps:
- storing number of speaker observations, number of imposter observations and a total number of observations from previous enrollments or verifications;
  
  obtaining an adaptation utterance from a speaker;
  
  extracting predetermined features from the speaker adaptation utterance;
  
  segmenting the predetermined features into a plurality of subwords;
  
  applying the plurality of subwords to at least one neural tree network model;
  
  counting the number of updated speaker observations within each leaf of the neural tree network;
  
  storing the number of updated speaker observations in memory; and
  
  updating probabilities by dividing the number of updated speaker observations by a total number of observations at each leaf, thereby resulting in an adapted neural tree network model.
- View Dependent Claims (7, 8, 9)
- - 7. The adaptable speaker verification method of claim 6, further comprising the steps of:
8. The adaptable speaker verification method of claim 6, wherein the step of segmenting comprises generating subwords using automatic blind speech segmentation.
9. The adaptable speaker verification method of claim 6, further comprising the step of:
- warping the predetermined features from the speaker adaptation utterance using a dynamic time warping template, wherein the dynamic warping template is adapted based on the predetermined features of the test speech, resulting in the creation of warped feature data; and
  
  wherein the step of segmenting segments the warped feature data into a plurality of subwords.

10. An adaptable speaker verification method, wherein at least one nueral tree network model is adapted based on an adaptation utterence, comprising the following steps:
- storing number of speaker observations, number of imposter observations and a total number of observations from previous enrollments or verifications;
  
  obtaining an adaptation utterence from an imposter;
  
  extracting predetermined features from the imposter adaptation utterence;
  
  segmenting the predetermined features into a plurality of subwords;
  
  applying the plurality of subwords to at least one neural tree network model;
  
  counting the number of updated imposter observations within each leaf of the neural tree network;
  
  storing the number of updated imposter observations in memory; and
  
  updating probabilities by dividing the number of updated speaker observations by a total number of observations at each leaf, thereby resulting in an adapted neural tree model.

11. An adaptable speaker verification method, including the steps of:
- obtaining enrollment speech from a known individual;
  
  receiving test speech from a user;
  
  extracting predetermined features of the test speech;
  
  warping the predetermined features using a dynamic time warping template, wherein the dynamic warping template is adapted based on the predetermined features of the test speech, resulting in the creation of warped feature data and a dynamic time warping score from the adapted dynamic warping template;
  
  generating subwords from the warped feature data;
  
  scoring the subwords using a plurality of adaptable models, wherein the adaptable models are adapted based on the subwords derived from the test speech and wherein the scoring comprises scoring at least one adaptable Gaussian mixture model, the adaptable Gaussian mixture model resulting in a GMM score; and
  
  scoring at least one adaptable neural tree network model, the adaptable neural tree network model resulting in a NTN score.;
  
  combining the results of each classifier score and the dynamic time warping score to generate a final score; and
  
  comparing the final score to a threshold value to determine whether the test speech and enrollment speech are from the known individual.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
SpeechWorks International, Inc. (Microsoft Corporation)
Original Assignee
T-Netix, Inc. (Cognizant Technology Solutions Corp.)
Inventors
Farrell, Kevin, Mistretta, William
Primary Examiner(s)
{haeck over (S)}mits, Tãlivaldis Ivars

Application Number

US09/185,871
Time in Patent Office

1,561 Days
Field of Search

704/232, 704/241, 704/249, 704/250
US Class Current

704/232
CPC Class Codes

G10L 15/07   to the speaker

G10L 17/04   Training, enrolment or mode...

G10L 17/14   Use of phonemic categorisat...

G10L 17/18   Artificial neural networks;...

G10L 17/20   Pattern transformations or ...

Model adaptation of neural tree networks and other fused models for speaker verification

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

133 Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

Model adaptation of neural tree networks and other fused models for speaker verification

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

133 Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links