Speaker recognition using a hierarchical speaker model tree

US 6,684,186 B2
Filed: 01/26/1999
Issued: 01/27/2004
Est. Priority Date: 01/26/1999
Status: Expired due to Term

First Claim

Patent Images

1. A method for speaker identification, comprising the steps of:

generating, for each of a plurality of speakers, a speaker model containing a collection of distributions of audio feature data associated with that speaker;

merging similar speaker models on a layer by layer basis so as to generate a hierarchical speaker model tree, wherein a lowest layer of the hierarchical speaker model tree comprises each generated speaker model; and

performing speaker identification of an unknown speaker using the hierarchical speaker model tree to determine if the unknown speaker is one of the plurality of speakers, wherein the step of performing speaker identification comprises;

i) receiving a new speech sample from the unknown speaker and generating a test speaker model therefrom;

ii) comparing said test model with merged speaker models within a higher layer, i+j, of said tree to determine which of the merged speaker models is closest to said test model;

iii) comparing said test model with children of said closest merged speaker model in a next lower layer, i+j−

l, to determine which child is closest to said test model;

repeating step (iii) on a layer by layer basis, including the lowest layer of the tree, whereby said unknown speaker is identified as the speaker corresponding to the closest speaker model in said lowest layer.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In an illustrative embodiment, a speaker model is generated for each of a number of speakers from which speech samples have been obtained. Each speaker model contains a collection of distributions of audio feature data derived from the speech sample of the associated speaker. A hierarchical speaker model tree is created by merging similar speaker models on a layer by layer basis. Each time two or more speaker models are merged, a corresponding parent speaker model is created in the next higher layer of the tree. The tree is useful in applications such as speaker verification and speaker identification.

Citations

21 Claims

1. A method for speaker identification, comprising the steps of:
- generating, for each of a plurality of speakers, a speaker model containing a collection of distributions of audio feature data associated with that speaker;
  
  merging similar speaker models on a layer by layer basis so as to generate a hierarchical speaker model tree, wherein a lowest layer of the hierarchical speaker model tree comprises each generated speaker model; and
  
  performing speaker identification of an unknown speaker using the hierarchical speaker model tree to determine if the unknown speaker is one of the plurality of speakers, wherein the step of performing speaker identification comprises;
  
  i) receiving a new speech sample from the unknown speaker and generating a test speaker model therefrom;
  
  ii) comparing said test model with merged speaker models within a higher layer, i+j, of said tree to determine which of the merged speaker models is closest to said test model;
  
  iii) comparing said test model with children of said closest merged speaker model in a next lower layer, i+j−
  
  l, to determine which child is closest to said test model;
  
  repeating step (iii) on a layer by layer basis, including the lowest layer of the tree, whereby said unknown speaker is identified as the speaker corresponding to the closest speaker model in said lowest layer.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1 wherein said step of merging similar speaker models includes measuring distances between a first speaker model and all other speaker models within the same layer to determine which of the other speaker models is closest to the first speaker model, then merging the closest speaker model wit the first speaker model to create a corresponding parent speaker model in a next higher layer of the tree.
  - 3. The method of claim 2, wherein distance between said first speaker model and a second speaker model within the same layer is measured by:
4. The method of claim 1 wherein each said distribution is a multi-dimensional Gaussian distribution.
5. The method of claim 1 wherein said step of merging similar models comprises:
- merging a first speaker model with a second speaker model which is close in distance to said first speaker model to form a parent speaker model, by establishing distribution pairs between the first and second speaker models and forming a merged distribution from each distribution pair, whereby the parent speaker model contains a plurality of merged distributions.
6. The method of claim 5 wherein each merged distribution is formed by avenging statistical parameters of the distributions of the respective distribution pair.
7. The method of claim 1, wherein said step of merging similar speaker models comprises determining sets of n speaker models in each layer having the closest distances to one another, and merging each set of n speaker models to form a corresponding parent speaker model on a layer by layer basis.
8. The method of claim 7, wherein n equals two, such that said tree has a binary structure.
9. The method of claim 7, further comprising the step of adding a leftover speaker model of a lower layer to a next higher layer.

10. A method for speaker verification, comprising the steps of:
- generating, for each of a plurality of registered speakers in a system, a speaker model containing a collection of distributions of audio feature data associated with that speaker;
  
  merging similar speaker models on a layer by layer basis so as to generate a hierarchical speaker model tree, wherein a lowest layer of the hierarchical speaker model tree are leaf member comprising the speaker models of said registered speakers; and
  
  performing speaker verification using the hierarchical speaker model tree to verify that a person is a registered speaker in the system, wherein the step of performing speaker verification comprising;
  
  receiving a claimed identification from the person corresponding to a particular one of said speaker models of the lowest layer of the hierarchical speaker model tree;
  
  determining a cohort set of similar speaker models associated with said particular speaker model using the hierarchical speaker model tree, wherein the step of determining a cohort set comprises the steps of matching the claimed identification with a leaf member of the hierarchical speaker model tree, traversing up the hierarchical speaker model tree from said leaf member to a parent node in a desired layer, and the traversing down the hierarchical speaker model tree from the parent node to all leaf members connected to the parent node, wherein all leaf members connected to the parent node comprise the cohort set;
  
  receiving a new speech sample from the person and generating a test speaker model therefrom;
  
  verifying that the person is a registered speaker if said particular speaker model is the closest model of said cohort set to said test model.
- View Dependent Claims (11, 12, 13, 14)
- - 11. The method of claim 10, further comprising:
12. The method of claim 11, wherein said complementary speaker models include a background model derived from speech data of speakers outside said tree.
13. The method of claim 10, further comprising:
- generating a plurality of complementary speaker models, each being a sibling speaker model of an ancestor of said particular speaker model; and
  
  rejecting said claimant speaker if said test model is closer in distance to any one of said complementary speaker models than to said particular speaker model.
14. The method of claim 13, further comprising providing a background speaker model derived from speakers outside said tree, and rejecting said claimant speaker if said test model is closer in distance to said background speaker model than to said particular speaker model.

15. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to provide method steps for performing speaker verification, said method steps comprising:
- generating, for each of a plurality of registered speakers in a system, a speaker model containing a collection of distributions of audio feature data associated with that speaker;
  
  merging similar speaker models on a layer by layer basis so as to generate a hierarchical speaker model tree, wherein a lowest layer of the hierarchical speaker model tree are leaf members comprising the speaker models of said registered speakers; and
  
  performing speaker verification using the hierarchical speaker model tree to verify that a person is a registered speaker in the system, wherein the step of performing speaker verification comprises;
  
  receiving a claimed identification from the person corresponding to a particular one of said speaker models of the lowest layer of the hierarchical speaker model tree;
  
  determining a cohort set of similar speaker models associated with said particular speaker model using the hierarchical speaker model tree, wherein the step of determining a cohort set comprises the steps in matching the claimed identification with a leaf member of the hierarchical speaker model tree, traversing up the hierarchical speaker model tree from said leaf member to a parent node in a desired layer, and traversing down the hierarchical speaker model tree the parent node to all leaf members connected to the parent node, wherein all leaf members connected to the parent node comprise the cohort set;
  
  receiving a new speech sample from the person and generating a test speaker model therefrom;
  
  verifying that the person is a registered speaker if said particular speaker model is the closest model of said cohort act to said test model.
- View Dependent Claims (16, 17, 18)
- - 16. The program storage device of claim 15, wherein said method steps further comprise:
17. The program storage device of claim 15, wherein said method steps further comprise:
- generating a plurality of complementary speaker models, each being a sibling speaker model of an ancestor of said particular speaker model; and
  
  rejecting said claimant speaker if said test model is closer in distance to any one of said complementary speaker models than to said particular speaker model.
18. The program storage device of claim 15, wherein said step of merging similar models comprises:
- merging a first speaker model with a second speaker model which is close in distance to said first speaker model to form a parent speaker model, by establishing distribution pairs between the first and second speaker models and forming a merged distribution from each distribution pair, whereby the parent speaker model contains a plurality of merged distributions.

19. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine, to perform method steps for performing speaker, the method steps comprising:
- generating a speaker model for each of a plurality of speakers, wherein each speaker model comprises a collection of distributions of audio feature data associated with a speaker;
  
  merging similar speaker models on a layer by layer basis so as to generate a hierarchical speaker model tree, wherein a lowest layer of the hierarchical speaker model tree comprises each generated speaker model; and
  
  performing speaker identification of an unknown speaker using the hierarchical speaker model tree to determine if the unknown speaker is one of the plurality of speakers, wherein the step of performing speaker identification comprises;
  
  i) receiving a new speaker sample from the unknown speaker and generating a test speaker model therefrom;
  
  ii) comparing said test model with merged speaker models within a higher layer, i+j, of said tree to determine which of the merged speaker models is closest to said test model;
  
  iii) comparing said test model with children of said closest merged speaker model in a next lower layer, i+j−
  
  l, to determine which child is closest to said test model;
  
  repeating step (iii) on a layer by layer basis, including the lowest layer of the tree, whereby said unknown speaker is identified as the speaker corresponding to the closest speaker model in said lowest layer.
- View Dependent Claims (20, 21)
- - 20. The program storage device of claim 19, wherein the instructions for merging similar speaker models comprise instructions for measuring distances between a first speaker model and all other speaker models within the same layer to determine which of the other speaker models is closest to the first speaker model, then merging the closest speaker model with the first speaker model to create a corresponding parent speaker model in a next higher layer of the tree.
  - 21. The program storage device of claim 19, wherein the instructions for merging similar models comprise instructions for merging a first speaker model with a second speaker model which is close in distance to said first speaker model to form a parent speaker model, by establishing distribution pairs between the first and second speaker models and forming a merged distribution from each distribution pair, whereby the parent speaker model contains a plurality of merged distributions.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Sorensen, Jeffrey S., Beigi, Homayoon S. M., Maes, Stephane H.
Primary Examiner(s)
Knepper, David D.

Application Number

US09/237,059
Publication Number

US 20030014250A1
Time in Patent Office

1,827 Days
Field of Search

704/246-250, 704/255-257, 704/243-245
US Class Current

704/246
CPC Class Codes

G10L 17/02 Preprocessing operations, e...

Speaker recognition using a hierarchical speaker model tree

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Speaker recognition using a hierarchical speaker model tree

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links