Multi-stage speaker adaptation

US 8,571,859 B1
Filed: 10/17/2012
Issued: 10/29/2013
Est. Priority Date: 05/31/2012
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

selecting, by a computing device, a first gender-specific speaker adaptation technique based on characteristics of a first set of feature vectors, wherein the first set of feature vectors correspond to a first unit of input speech, and wherein the first set of feature vectors are configured for use in automatic speech recognition (ASR) of the first unit of input speech, wherein the first gender-specific speaker adaptation technique is associated with a particular gender;

modifying a second set of feature vectors based on the first gender-specific speaker adaptation technique, wherein the second set of feature vectors correspond to a second unit of input speech, and wherein the modified second set of feature vectors are configured for use in ASR of the second unit of input speech;

based on characteristics of the second set of feature vectors and the first gender-specific speaker adaptation technique being associated with a particular gender, selecting a first speaker-dependent speaker adaptation technique that is associated with a particular speaker of the particular gender; and

modifying a third set of feature vectors based on the first speaker-dependent speaker adaptation technique, wherein the third set of feature vectors correspond to a third unit of input speech, and wherein the modified third set of feature vectors are configured for use in ASR of the third unit of input speech.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A first gender-specific speaker adaptation technique may be selected based on characteristics of a first set of feature vectors that correspond to a first unit of input speech. The first set of feature vectors may be configured for use in automatic speech recognition (ASR) of the first unit of input speech. A second set of feature vectors, which correspond to a second unit of input speech, may be modified based on the first gender-specific speaker adaptation technique. The modified second set of feature vectors may be configured for use in ASR of the second unit of input speech. A first speaker-dependent speaker adaptation technique may be selected based on characteristics of the second set of feature vectors. A third set of feature vectors, which correspond to a third unit of speech, may be modified based on the first speaker-dependent speaker adaptation technique.

Citations

18 Claims

1. A method comprising:
- selecting, by a computing device, a first gender-specific speaker adaptation technique based on characteristics of a first set of feature vectors, wherein the first set of feature vectors correspond to a first unit of input speech, and wherein the first set of feature vectors are configured for use in automatic speech recognition (ASR) of the first unit of input speech, wherein the first gender-specific speaker adaptation technique is associated with a particular gender;
  
  modifying a second set of feature vectors based on the first gender-specific speaker adaptation technique, wherein the second set of feature vectors correspond to a second unit of input speech, and wherein the modified second set of feature vectors are configured for use in ASR of the second unit of input speech;
  
  based on characteristics of the second set of feature vectors and the first gender-specific speaker adaptation technique being associated with a particular gender, selecting a first speaker-dependent speaker adaptation technique that is associated with a particular speaker of the particular gender; and
  
  modifying a third set of feature vectors based on the first speaker-dependent speaker adaptation technique, wherein the third set of feature vectors correspond to a third unit of input speech, and wherein the modified third set of feature vectors are configured for use in ASR of the third unit of input speech.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1 further comprising:
    - determining that (i) the third unit of input speech was originated proximate to a particular location, and (ii) the particular speaker is also associated with an environment-specific, speaker-dependent speaker adaptation technique, wherein the environment-specific, speaker-dependent speaker adaptation technique is associated with the particular location;
      
      selecting the environment-specific, speaker-dependent speaker adaptation technique; and
      
      modifying a fourth set of feature vectors based on the environment-specific, speaker-dependent speaker adaptation technique, wherein the fourth set of feature vectors correspond to a fourth unit of input speech, and wherein the modified fourth set of feature vectors are configured for use in ASR of the fourth unit of input speech.
  - 3. The method of claim 1 further comprising:
    - selecting a second speaker-dependent speaker adaptation technique based on characteristics of the third set of feature vectors, wherein a second particular speaker is associated with the second speaker-dependent speaker adaptation technique; and
      
      modifying a fourth set of feature vectors based on the second speaker-dependent speaker adaptation technique, wherein the fourth set of feature vectors correspond to a fourth unit of input speech, and wherein the modified fourth set of feature vectors are configured for use in ASR of the fourth unit of input speech.
  - 4. The method of claim 1, wherein selecting the first speaker-dependent speaker adaptation technique based on characteristics of the second set of feature vectors comprises determining that the characteristics of the second set of feature vectors fit a speaker-dependent speech model associated with the first speaker-dependent speaker adaptation technique better than the characteristics of the second set of feature vectors fit one or more additional speaker-dependent speech models.
  - 5. The method of claim 1, wherein the first gender-specific speaker adaptation technique is associated with a speech model of a first gender, wherein a second gender-specific speaker adaptation technique is associated with a speech model of a second gender, and wherein selecting the first gender-specific speaker adaptation technique based on the characteristics of the first set of feature vectors comprises:
    - determining that the characteristics of the first set of feature vectors fit the speech model of the first gender better than the speech model of the second gender.
  - 6. The method of claim 5, further comprising:
    - determining that (i) characteristics of the third set of feature vectors fit the speech model of the second gender better than the speech model of the first gender, and (ii) the characteristics of the third set of feature vectors fit the speech model of the second gender better than speech model of the first speaker-dependent speaker adaptation technique;
      
      selecting the second gender-specific speaker adaptation technique; and
      
      modifying a fourth set of feature vectors based on the second gender-specific speaker adaptation technique, wherein the fourth set of feature vectors correspond to a fourth unit of input speech, and wherein the modified fourth set of feature vectors are configured for use in ASR of the fourth unit of input speech.
  - 7. The method of claim 1, wherein modifying the second set of feature vectors based on the first gender-specific speaker adaptation technique comprises applying a first gender-specific transform to feature vectors in the second set of feature vectors, wherein the first gender-specific transform is associated with the first gender-specific speaker adaptation technique, wherein modifying the third set of feature vectors based on the first speaker-dependent speaker adaptation technique comprises applying a first speaker-dependent transform to feature vectors in the third set of feature vectors, wherein the first speaker-dependent transform is associated with the first speaker-dependent speaker adaptation technique.

8. A method comprising:
- obtaining, at a computing device, a first set of feature vectors, wherein the first set of feature vectors correspond to a first unit of input speech;
  
  comparing characteristics of the first set of feature vectors to a first gender-specific speech model and a second gender-specific speech model;
  
  determining that the characteristics of the first set of feature vectors fit the first gender-specific speech model better than the second gender-specific model;
  
  obtaining a second set of feature vectors, wherein the second set of feature vectors correspond to a second unit of input speech;
  
  modifying the second set of feature vectors based on a first gender-specific speaker adaptation technique associated with the first gender-specific speech model;
  
  after modifying the second set of feature vectors, comparing characteristics of the second set of feature vectors to the first gender-specific speech model, the second gender-specific speech model, and a speaker-dependent speech model;
  
  determining that the characteristics of the second set of feature vectors fit the speaker-dependent speech model better than the first and second gender-specific models;
  
  obtaining a third set of feature vectors, wherein the third set of feature vectors correspond to a third unit of input speech; and
  
  modifying the third set of feature vectors based on a speaker-dependent speaker adaptation technique associated with the speaker-dependent speech model.
- View Dependent Claims (9, 10, 11)
- - 9. The method of claim 8, further comprising:
    - after modifying the third set of feature vectors, comparing the characteristics of the third set of feature vectors to the first gender-specific speech model, the second gender-specific speech model, the speaker-dependent speech model, and at least one environment-specific, speaker-dependent speech model, wherein the speaker-dependent speech model and the environment-specific, speaker-dependent speech model are both associated with a particular speaker;
      
      determining that the characteristics of the third set of feature vectors fit the environment-specific, speaker-dependent speech model better than the speaker-dependent speech model and both of the first and second gender-specific models;
      
      obtaining a fourth set of feature vectors, wherein the fourth set of feature vectors correspond to a fourth unit of input speech; and
      
      modifying the fourth set of feature vectors based on an environment-specific, speaker-dependent speaker adaptation technique associated with the environment-specific, speaker-dependent speech model.
  - 10. The method of claim 8, wherein modifying the second set of feature vectors based on the first gender-specific speaker adaptation technique comprises applying a first gender-specific transform to feature vectors in the second set of feature vectors, wherein the first gender-specific transform is associated with the first gender-specific speaker adaptation technique, wherein modifying the third set of feature vectors based on the speaker-dependent speaker adaptation technique comprises applying a speaker-dependent transform to feature vectors in the third set of feature vectors, wherein the speaker-dependent transform is associated with the speaker-dependent speaker adaptation technique.
  - 11. The method of claim 8, wherein the first gender-specific speaker adaptation technique is associated with a particular gender, and wherein the speaker-dependent speaker adaptation technique is associated with a speaker of the particular gender, the method further comprising:
    - selecting the speaker-dependent speaker adaptation technique based on the first gender-specific speaker adaptation technique being associated with a particular gender.

12. An article of manufacture including a non-transitory computer-readable storage medium, having stored thereon program instructions that, upon execution by a computing device, cause the computing device to perform operations comprising:
- selecting a first gender-specific speaker adaptation technique based on characteristics of a first set of feature vectors, wherein the first set of feature vectors correspond to a first unit of input speech, and wherein the first set of feature vectors are configured for use in automatic speech recognition (ASR) of the first unit of input speech, wherein the first gender-specific speaker adaptation technique is associated with a particular gender;
  
  modifying a second set of feature vectors based on the first gender-specific speaker adaptation technique, wherein the second set of feature vectors correspond to a second unit of input speech, and wherein the modified second set of feature vectors are configured for use in ASR of the second unit of input speech;
  
  based on characteristics of the second set of feature vectors and the first gender-specific speaker adaptation technique being associated with a particular gender, selecting a first speaker-dependent speaker adaptation technique that is associated with a particular speaker of the particular gender; and
  
  modifying a third set of feature vectors based on the first speaker-dependent speaker adaptation technique, wherein the third set of feature vectors correspond to a third unit of input speech, and wherein the modified third set of feature vectors are configured for use in ASR of the third unit of input speech.
- View Dependent Claims (13, 14, 15, 16, 17, 18)
- - 13. The article of manufacture of claim 12, the operations further comprising:
    - determining that (i) the third unit of input speech was originated proximate to a particular location, and (ii) the particular speaker is also associated with an environment-specific, speaker-dependent speaker adaptation technique, wherein the environment-specific, speaker-dependent speaker adaptation technique is associated with the particular location;
      
      selecting the environment-specific, speaker-dependent speaker adaptation technique; and
      
      modifying a fourth set of feature vectors based on the environment-specific, speaker-dependent speaker adaptation technique, wherein the fourth set of feature vectors correspond to a fourth unit of input speech, and wherein the modified fourth set of feature vectors are configured for use in ASR of the fourth unit of input speech.
  - 14. The article of manufacture of claim 12, the operations further comprising:
    - selecting a second speaker-dependent speaker adaptation technique based on characteristics of the third set of feature vectors, wherein a second particular speaker is associated with the second speaker-dependent speaker adaptation technique; and
      
      modifying a fourth set of feature vectors based on the second speaker-dependent speaker adaptation technique, wherein the fourth set of feature vectors correspond to a fourth unit of input speech, and wherein the modified fourth set of feature vectors are configured for use in ASR of the fourth unit of input speech.
  - 15. The article of manufacture of claim 12, wherein selecting the first speaker-dependent speaker adaptation technique based on characteristics of the second set of feature vectors comprises determining that the characteristics of the second set of feature vectors fit a speaker-dependent speech model associated with the first speaker-dependent speaker adaptation technique better than the characteristics of the second set of feature vectors fit one or more additional speaker-dependent speech models.
  - 16. The article of manufacture of claim 12, wherein the first gender-specific speaker adaptation technique is associated with a speech model of a first gender, wherein a second gender-specific speaker adaptation technique is associated with a speech model of a second gender, and wherein selecting the first gender-specific speaker adaptation technique based on the characteristics of the first set of feature vectors comprises:
    - determining that the characteristics of the first set of feature vectors fit the speech model of the first gender better than the speech model of the second gender.
  - 17. The article of manufacture of claim 16, wherein the operations further comprise:
    - determining that (i) characteristics of the third set of feature vectors fit the speech model of the second gender better than the speech model of the first gender, and (ii) the characteristics of the third set of feature vectors fit the speech model of the second gender better than a speech model of the first speaker-dependent speaker adaptation technique;
      
      selecting the second gender-specific speaker adaptation technique; and
      
      modifying a fourth set of feature vectors based on the second gender-specific speaker adaptation technique, wherein the fourth set of feature vectors correspond to a fourth unit of input speech, and wherein the modified fourth set of feature vectors are configured for use in ASR of the fourth unit of input speech.
  - 18. The article of manufacture of claim 12, wherein modifying the second set of feature vectors based on the first gender-specific speaker adaptation technique comprises applying a first gender-specific transform to feature vectors in the second set of feature vectors, wherein the first gender-specific transform is associated with the first gender-specific speaker adaptation technique, wherein modifying the third set of feature vectors based on the first speaker-dependent speaker adaptation technique comprises applying a first speaker-dependent transform to feature vectors in the third set of feature vectors, wherein the first speaker-dependent transform is associated with the first speaker-dependent speaker adaptation technique.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Aleksic, Petar, Lei, Xin
Primary Examiner(s)
Opsasnick, Michael N

Application Number

US13/653,792
Time in Patent Office

377 Days
Field of Search

704/231, 704/255
US Class Current

704/231
CPC Class Codes

G10L 15/065   Adaptation

G10L 15/07   to the speaker

G10L 17/00   Speaker identification or v...

Multi-stage speaker adaptation

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Multi-stage speaker adaptation

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links