Speaker adaptation system and method based on class-specific pre-clustering training speakers

US 6,073,096 A
Filed: 02/04/1998
Issued: 06/06/2000
Est. Priority Date: 02/04/1998
Status: Expired due to Fees

First Claim

Patent Images

1. A method of speech recognition comprising the steps of:

grouping acoustics to form classes based on acoustic features;

clustering training speakers by the classes to provide class-specific cluster systems;

selecting from the cluster systems, a subset of cluster systems closest to adaptation data from a speaker;

transforming the subset of cluster systems to bring the subset of cluster systems closer to the speaker based on the adaptation data to form adapted cluster systems; and

combining the adapted cluster systems to create a speaker adapted system for decoding speech from the speaker.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of speech recognition, in accordance with the present invention includes the steps of grouping acoustics to form classes based on acoustic features, clustering training speakers by the classes to provide class-specific cluster systems, selecting from the cluster systems, a subset of cluster systems closest to adaptation data from a test speaker, transforming the subset of cluster systems to bring the subset of cluster systems closer to the test speaker based on the adaptation data to form adapted cluster systems and combining the adapted cluster systems to create a speaker adapted system for decoding speech from the test speaker. System and methods for building speech recognition systems as well as adapting speaker systems for class-specific speaker clusters are included.

134 Citations

42 Claims

1. A method of speech recognition comprising the steps of:
- grouping acoustics to form classes based on acoustic features;
  
  clustering training speakers by the classes to provide class-specific cluster systems;
  
  selecting from the cluster systems, a subset of cluster systems closest to adaptation data from a speaker;
  
  transforming the subset of cluster systems to bring the subset of cluster systems closer to the speaker based on the adaptation data to form adapted cluster systems; and
  
  combining the adapted cluster systems to create a speaker adapted system for decoding speech from the speaker.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of speech recognition as recited in claim 1, wherein the step of transforming is performed by employing a maximum a posteriori adaptation.
  - 3. The method of speech recognition as recited in claim 1, wherein the step of transforming is performed by employing maximum likelihood linear regression.
  - 4. The method of speech recognition as recited in claim 1, wherein the step of selecting further comprises:
    - calculating a likelihood between the speaker and each cluster system;
      
      ranking each cluster system according to likelihood; and
      
      selecting a predetermined number of cluster systems closest to the speaker adaptation data to form the subset of cluster systems.
  - 5. The method of speech recognition as recited in claim 1, further comprising the step of building class-specific cluster systems based on speaker dependent systems and partitions of acoustic space, each partition of acoustic space being characterized by a different set of acoustic features.
  - 6. The method of speech recognition as recited in claim 1, wherein the step of transforming further comprises transforming the subset of cluster systems to adapt the subset of cluster systems with a speaker independent system.
  - 7. The method of speech recognition as recited in claim 1, wherein each cluster system includes a Hidden Markov Model system.
  - 8. The method of speech recognition as recited in claim 1, wherein the step of transforming is performed by employing a Bayesian adaptation.

9. A method of building class-specific cluster systems comprising the steps of:
- providing a speaker dependent system for each of a plurality of training speakers;
  
  partitioning an acoustic space according to classes, each class being characterized by a set of acoustic features;
  
  grouping the speaker dependent systems with the acoustic spaces according to classes to build acoustic spaces with common features from all the speaker dependent systems; and
  
  clustering the grouped acoustic spaces with common features to form cluster systems based on acoustic characteristics of the speakers, the acoustic characteristics including class-specific characteristics.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 10. The method of building class-specific cluster systems as recited in claim 9 further comprises the step of combining the cluster systems to form a cluster system for the full acoustic space.
  - 11. The method of building class-specific cluster systems as recited in claim 9, wherein the step of clustering includes the step of clustering by a bottom-up method.
  - 12. The method of building class-specific cluster systems as recited in claim 11, wherein the step of clustering by the bottom-up method includes measuring a distance between acoustic features between speakers by a Gaussian likelihood.
  - 13. The method of building class-specific cluster systems as recited in claim 9, wherein the step of clustering includes the step of clustering by a top-down method.
  - 14. The method of building class-specific cluster systems as recited in claim 13, wherein the step of clustering by the top-down method includes measuring a distance between acoustic features between speakers by a Euclidean distance.
  - 15. The method of building class-specific cluster systems as recited in claim 9, wherein the step of clustering includes the step of clustering the grouped acoustic spaces to form cluster systems based on a common accent.
  - 16. The method of building class-specific cluster systems as recited in claim 9, wherein the step of clustering includes the step of clustering the grouped acoustic spaces to form cluster systems based on a common gender.
  - 17. The method of building class-specific cluster systems as recited in claim 9, wherein the step of providing a speaker dependent system for each of a plurality of training speakers further comprises the steps of:
    - decoding training data of a speaker based on a speaker-independent system; and
      
      building the speaker dependent system by storing a set of labeled acoustic vectors for the speaker.
  - 18. The method of building class-specific cluster systems as recited in claim 9, wherein the step of partitioning an acoustic space further includes the steps of:
    - gathering expectation-maximization counts of a same context from a speaker independent system;
      
      generating a speaker independent context-independent system from the counts; and
      
      clustering the context-independent system to generate partitions of a whole acoustic space into the acoustic spaces.

19. A method of speech recognition comprising the steps of:
- providing a speaker dependent system for each of a plurality of training speakers;
  
  providing an acoustic space for each of the training speakers, each acoustic space being characterized by a set of acoustic features;
  
  grouping the speaker dependent systems with the acoustic spaces to build acoustic spaces with common features from all the speaker dependent systems;
  
  clustering the grouped acoustic spaces to form cluster systems based on a common acoustic characteristic;
  
  selecting from a group of cluster systems, a subset of cluster systems closest to adaptation data from a speaker;
  
  transforming the subset of cluster systems to bring the subset of cluster systems closer to the speaker based on the adaptation data to form adapted cluster systems; and
  
  combining the adapted cluster systems to create a speaker adapted system for decoding speech from the speaker.
- View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
- - 20. The method of speech recognition as recited in claim 19, wherein the step of selecting further comprises:
    - calculating a likelihood for each cluster system;
      
      ranking each cluster system according to likelihood; and
      
      selecting a predetermined number of cluster systems closest to the speaker adaptation data to form the subset of cluster systems.
  - 21. The method of speech recognition as recited in claim 19, wherein the group of systems includes class-specific clusters and further comprises the step of building class-specific cluster systems based on speaker dependent systems and partitions of acoustic space, each partition of acoustic space being characterized by a different set of acoustic features.
  - 22. The method of speech recognition as recited in claim 19, wherein each cluster system of the group of cluster systems includes a Hidden Markov Model system.
  - 23. The method of speech recognition as recited in claim 19, further comprises the step of combining the cluster systems to form a speaker cluster system of all the acoustic spaces.
  - 24. The method of speech recognition as recited in claim 19, wherein the step of clustering includes the step of clustering by a bottom-up method.
  - 25. The method of speech recognition as recited in claim 24, wherein the step of clustering by the bottom-up method includes measuring a distance between acoustic features between speakers by a Gaussian likelihood.
  - 26. The method of speech recognition as recited in claim 19, wherein the step of clustering includes the step of clustering by a top-down method.
  - 27. The method of speech recognition as recited in claim 26, wherein the step of clustering by the top-down method includes measuring a distance between acoustic features between speakers by a Euclidean distance.
  - 28. The method of speech recognition as recited in claim 19, wherein the step of providing a speaker dependent system for each of a plurality of training speakers further comprises the steps of:
    - decoding training data of a speaker based on a speaker-independent system; and
      
      building the speaker dependent system by storing a set of labeled acoustic vectors for the speaker.
  - 29. The method of speech recognition as recited in claim 19, wherein the step of providing an acoustic space for each of the training speakers further includes the steps of:
    - gathering expectation maximization counts of a same context from a speaker independent system;
      
      generating a speaker independent context-independent system from the counts; and
      
      clustering the context-independent system to generate partition of a whole acoustic space into the acoustic spaces.

30. A system for speech recognition comprising:
- means for grouping acoustics to form classes based on acoustic features;
  
  means for clustering training speakers by the classes to provide class-specific cluster systems;
  
  means for selecting from the cluster systems, a subset of cluster systems closest to adaptation data from a speaker;
  
  means for transforming the subset of cluster systems to bring the subset of cluster systems closer to the speaker based on the adaptation data to form adapted cluster systems; and
  
  means for combining the adapted cluster systems to create a speaker adapted system for decoding speech from the speaker.
- View Dependent Claims (31, 32, 33, 34, 35)
- - 31. The system for speech recognition as recited in claim 30, wherein the means for transforming includes means for employing a maximum a posteriori adaptation.
  - 32. The system for speech recognition as recited in claim 30, wherein the means for transforming includes means for employing maximum likelihood linear regression.
  - 33. The system for speech recognition as recited in claim 30, wherein the means for selecting further comprises:
    - means for calculating a likelihood between the speaker and each cluster system;
      
      means for ranking each cluster system according to likelihood; and
      
      means for selecting a predetermined number of cluster systems closest to the speaker adaptation data to form the subset of cluster systems.
  - 34. The system for speech recognition as recited in claim 30, wherein the group of systems includes class-specific clusters and further comprising means for building class-specific cluster systems based on speaker dependent systems and partitions of acoustic space, each partition of acoustic space being characterized by a different set of acoustic features.
  - 35. The system for speech recognition as recited in claim 30, wherein each cluster system includes a Hidden Markov Model system.

36. A system for speech recognition comprising:
- a speaker dependent system for each of a plurality of training speakers;
  
  an acoustic space for each of the training speakers, each acoustic space being characterized by a set of acoustic features;
  
  means for grouping the speaker dependent systems with the acoustic spaces to build acoustic spaces with common features from all the speaker dependent systems;
  
  means for clustering the grouped acoustic spaces to form cluster systems based on a common acoustic characteristic;
  
  means for selecting from a group of cluster systems, a subset of cluster systems closest to adaptation data from a speaker;
  
  means for transforming the subset of cluster systems to bring the subset of cluster systems closer to the speaker based on the adaptation data to form adapted cluster systems; and
  
  means for combining the adapted cluster systems to create a speaker adapted system for decoding speech from the speaker.
- View Dependent Claims (37, 38, 39, 40)
- - 37. The system for speech recognition as recited in claim 36, wherein the means for selecting further comprises:
    - means for calculating a likelihood for each cluster system;
      
      means for ranking each cluster system according to likelihood; and
      
      means for selecting a predetermined number of cluster systems closest to the speaker adaptation data to form the subset of cluster systems.
  - 38. The system for speech recognition as recited in claim 36, wherein the group of cluster systems includes class-specific clusters based on speaker dependent systems and partitions of acoustic space, each partition of acoustic space being characterized by a different set of acoustic features.
  - 39. The system for speech recognition as recited in claim 36, wherein each cluster system of the group of cluster systems includes a Hidden Markov Model.
  - 40. The system for speech recognition as recited in claim 36, further comprises a speaker cluster system formed by combining all of the acoustic spaces.

41. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for speech recognition, the method steps comprising:
- grouping acoustics to form classes based on acoustic features;
  
  clustering training speakers by the classes to provide class-specific cluster systems;
  
  selecting from the cluster systems, a subset of cluster systems closest to adaptation data from a speaker;
  
  transforming the subset of cluster systems to bring the subset of cluster systems closer to the speaker based on the adaptation data to form adapted cluster systems; and
  
  combining the adapted cluster systems to create a speaker adapted system for decoding speech from the speaker.

42. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for building class-specific cluster systems for speech recognition systems, the method steps comprising:
- providing a speaker dependent system for each of a plurality of training speakers;
  
  partitioning an acoustic space according to classes, each class being characterized by a set of acoustic features;
  
  grouping the speaker dependent systems with the acoustic spaces according to classes to build acoustic spaces with common features from all the speaker dependent systems; and
  
  clustering the grouped acoustic spaces with common features to form cluster systems based on acoustic characteristics of the speakers, the acoustic characteristics including class-specific characteristics.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Gao, Yuqing, Picheny, Michael Alan, Padmanabhan, Mukund
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
Lerner, Martin

Application Number

US09/018,350
Time in Patent Office

853 Days
Field of Search

704/231, 704/238, 704/245, 704/250, 704/255, 704/256
US Class Current

704/245
CPC Class Codes

G10L 15/07 to the speaker

Speaker adaptation system and method based on class-specific pre-clustering training speakers

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

134 Citations

42 Claims

Specification

Solutions

Use Cases

Quick Links

Speaker adaptation system and method based on class-specific pre-clustering training speakers

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

134 Citations

42 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links