Method and apparatus for multi-environment speaker verification

US 6,253,179 B1
Filed: 01/29/1999
Issued: 06/26/2001
Est. Priority Date: 01/29/1999
Status: Expired due to Term

First Claim

Patent Images

1. A computer-implemented method, comprising:

obtaining training data from each of a plurality T of sources constituting an enrolled population, over a plurality M of channels;

developing models for each of said T sources based on said training data, each model containing a collection of distributions;

generating a hierarchical model tree based on said models of said I sources, wherein at least some merged models within layers of said hierarchical model tree are computed via partitioning or grouping with respect to channel properties; and

obtaining training data from a new source over a new channel for addition to said enrolled population, developing a new model based thereupon and updating said hierarchical model tree with said new model.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for unsupervised environmental normalization for speaker verification using hierarchical clustering is disclosed. Training data (speech samples) are taken from T enrolled (registered) speakers over any one of M channels, e.g., different microphones, communication links, etc. For each speaker, a speaker model is generated, each containing a collection of distributions of audio feature data derived from the speech sample of that speaker. A hierarchical speaker model tree is created, e.g., by merging similar speaker models on a layer by layer basis. Each speaker is also grouped into a cohort of similar speakers. For each cohort, one or more complementary speaker models are generated by merging speaker models outside that cohort. When training data from a new speaker to be enrolled is received over a new channel, the speaker model tree as well as the complementary models are updated. Consequently, adaptation to data from new environments is possible by incorporating such data into the verification model whenever it is encountered.

71 Citations

View as Search Results

19 Claims

1. A computer-implemented method, comprising:
- obtaining training data from each of a plurality T of sources constituting an enrolled population, over a plurality M of channels;
  
  developing models for each of said T sources based on said training data, each model containing a collection of distributions;
  
  generating a hierarchical model tree based on said models of said I sources, wherein at least some merged models within layers of said hierarchical model tree are computed via partitioning or grouping with respect to channel properties; and
  
  obtaining training data from a new source over a new channel for addition to said enrolled population, developing a new model based thereupon and updating said hierarchical model tree with said new model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1 wherein said each of said plurality T of sources comprises a source of speech from a particular speaker, and said models comprise speaker models.
  - 3. The method of claim 2, wherein said method is utilized for speaker verification.
  - 4. The method of claim 1, further comprising the steps of:
5. The method of claim 4, further comprising the step of:
- generating, for a particular cohort, at least one complementary model representing a merger of speaker models outside said particular cohort.
6. The method of claim 5, further comprising the step of updating said complementary model when a new source and corresponding model is added to said enrolled population.
7. The method of claim 5, wherein said at least one complementary model is a cumulative complementary model which is a model formed by merging models on multiple levels of said tree that are outside said particular cohort.
8. The method of claim 5, wherein said at least one model comprises a plurality of merged models, each merged model being a sibling model of an ancestor of a model within said particular cohort.
9. The method of claim 1 wherein:
- each said model contains a collection of distributions of feature data associated with the corresponding source; and
  
  said step of generating a hierarchical model tree comprises merging similar models on a layer by layer basis.
10. The method of claim 9 wherein said feature data comprises image data.
11. The method of claim 1 wherein said hierarchical model tree is generated using a top down technique in which a merged model of all models of the T sources is sequentially partitioned on a layer by layer basis.
12. The method of claim 1 wherein each said distribution is a multi-dimensional Gaussian distribution.

13. A speaker verification method comprising the steps of:
- obtaining training data from each of a plurality T of sources constituting an enrolled population, over a plurality M of channels;
  
  developing speaker models for each of said T speakers based on said training data, each model containing a collection of audio feature distributions;
  
  generating a hierarchical speaker model tree based on said models of said T speakers, wherein at least some merged models within layers of said hierarchical speaker model tree are computed via partitioning or grouping with respect to channel properties;
  
  receiving a claimed identification (ID) of a claimant, said claimed ID representing a speaker corresponding to a particular one of said speaker models;
  
  determining a cohort set containing said particular speaker model and similar speaker models thereto;
  
  receiving data corresponding to a speech sample of said claimant and generating a test speaker model therefrom; and
  
  comparing said test model to all speaker models of said cohort set and verifying said claimant if said particular speaker is the closest model of said cohort set to said test model.
- View Dependent Claims (14, 15, 16, 17)
- - 14. The method of claim 13, further comprising the steps of:
15. The method of claim 14, wherein said complementary speaker models include a background model derived from speech data of speakers outside said tree.
16. The method of claim 13, further comprising the steps of:
- generating a plurality of complementary speaker models, each being a sibling speaker model of an ancestor of said particular speaker model; and
  
  rejecting said claimant if said test model is closer in distance to any one of said complementary speaker models than to said particular speaker model.
17. The method of claim 16, further comprising providing a background speaker model derived from speakers outside said tree, and rejecting said claimant if said test model is closer in distance to said background speaker model than to said particular speaker model.

18. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to provide method steps for performing pattern matching, said method comprising:
- obtaining training data from each of a plurality T of sources constituting an enrolled population, over a plurality M of channels;
  
  developing models for each of said T sources based on said training data, each model containing a collection of distributions;
  
  generating a hierarchical model tree based on said models of said T sources, wherein at least some merged models within layers of said hierarchical model tree are computed via partitioning or grouping with respect to channel properties; and
  
  obtaining training data from a new source over a new channel for addition to said enrolled population, developing a new model based thereupon and updating said hierarchical model tree with said new model.
- View Dependent Claims (19)
- - 19. The program storage device of claim 18, wherein said each of said plurality T of sources comprises a source of speech from a particular speaker, and said models comprise speaker models.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Beigi, Homayoon S., Maes, Stephane H., Sorensen, Jeffrey S., Chaudhari, Upendra V.
Primary Examiner(s)
Korzuch, William R.
Assistant Examiner(s)
Lerner, Martin

Application Number

US09/240,346
Time in Patent Office

879 Days
Field of Search

704/243, 704/244, 704/245, 704/246, 704/250, 704/233, 704/273
US Class Current

704/243
CPC Class Codes

G10L 17/04 Training, enrolment or mode...

G10L 17/20 Pattern transformations or ...

Method and apparatus for multi-environment speaker verification

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

71 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for multi-environment speaker verification

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

71 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links