Speaker adaptation of vocabulary for speech recognition

US 8,731,928 B2
Filed: 03/15/2013
Issued: 05/20/2014
Est. Priority Date: 12/16/2002
Status: Expired due to Fees

First Claim

Patent Images

1. A method for constructing at least one speaker-specific recognition vocabulary from a speaker-independent recognition vocabulary that comprises a first group of words, wherein each word in the first group of words contains a first portion associated with plural alternate pronunciations in the speaker-independent recognition vocabulary for the respective word, the method comprising:

recognizing, by at least one processor, a first keyword in speech input spoken by a first speaker, wherein the first keyword contains the first portion;

identifying, by the at least one processor, a first spoken pronunciation for the first portion based, at least in part, on how the first speaker pronounced the first keyword in the speech input;

constructing a first speaker-specific recognition vocabulary by including, for each of the words in the first group of words, a first recognition pronunciation of the respective word selected from the plural alternate pronunciations based on the identified first spoken pronunciation;

recognizing, by the at least one processor, a second keyword in the speech input spoken by the first speaker, wherein the second keyword contains the first portion;

identifying, by the at least one processor, a second spoken pronunciation for the first portion based, at least in part, on how the first speaker pronounced the second keyword in the speech input; and

constructing the first speaker-specific recognition vocabulary by including, for each of the words in the first group of words, a second recognition pronunciation selected from the plural alternate pronunciations based on the identified second spoken pronunciation.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A phonetic vocabulary for a speech recognition system is adapted to a particular speaker'"'"'s pronunciation. A speaker can be attributed specific pronunciation styles, which can be identified from specific pronunciation examples. Consequently, a phonetic vocabulary can be reduced in size, which can improve recognition accuracy and recognition speed.

35 Citations

View as Search Results

17 Claims

1. A method for constructing at least one speaker-specific recognition vocabulary from a speaker-independent recognition vocabulary that comprises a first group of words, wherein each word in the first group of words contains a first portion associated with plural alternate pronunciations in the speaker-independent recognition vocabulary for the respective word, the method comprising:
- recognizing, by at least one processor, a first keyword in speech input spoken by a first speaker, wherein the first keyword contains the first portion;
  
  identifying, by the at least one processor, a first spoken pronunciation for the first portion based, at least in part, on how the first speaker pronounced the first keyword in the speech input;
  
  constructing a first speaker-specific recognition vocabulary by including, for each of the words in the first group of words, a first recognition pronunciation of the respective word selected from the plural alternate pronunciations based on the identified first spoken pronunciation;
  
  recognizing, by the at least one processor, a second keyword in the speech input spoken by the first speaker, wherein the second keyword contains the first portion;
  
  identifying, by the at least one processor, a second spoken pronunciation for the first portion based, at least in part, on how the first speaker pronounced the second keyword in the speech input; and
  
  constructing the first speaker-specific recognition vocabulary by including, for each of the words in the first group of words, a second recognition pronunciation selected from the plural alternate pronunciations based on the identified second spoken pronunciation.
- View Dependent Claims (2, 3, 4, 5, 6, 17)
- - 2. The method of claim 1, wherein the first keyword is identified as a representative of the first group of words prior to recognizing the first keyword in the speech input.
  - 3. The method of claim 1, comprising selecting, as the first recognition pronunciation, one of the plural alternate pronunciations based on comparing the first spoken pronunciation to a corresponding portion of each of the plural alternate pronunciations.
  - 4. The method of claim 3, further comprising generating adaptation rules based upon the selected first recognition pronunciation, wherein the adaptation rules facilitate constructing the first speaker-specific recognition vocabulary.
  - 5. The method of claim 1, wherein the speech input is received from the first speaker reading an enrollment script containing at least the first keyword provided to the first speaker by a speech recognition system.
  - 6. The method of claim 1, wherein the speech input is received from the first speaker utilizing a speech recognition system to recognize the speech input.
  - 17. The apparatus of claim 1, wherein the speech input is received from the first speaker utilizing a speech recognition system to recognize the speech input.

7. At least one non-transitory computer readable medium comprising instructions that, when executed by at least one processor, perform a method for constructing at least one speaker-specific recognition vocabulary from a speaker-independent recognition vocabulary that comprises a first group of words, wherein each word in the first group of words contains a first portion associated with plural alternate pronunciations in the speaker-independent recognition vocabulary for the respective word, the method comprising:
- recognizing a first keyword in speech input spoken by a first speaker, wherein the first keyword contains the first portion;
  
  identifying a first spoken pronunciation for the first portion based, at least in part, on how the first speaker pronounced the first keyword in the speech input;
  
  constructing a first speaker-specific recognition vocabulary by including, for each of the words in the first group of words, a first recognition pronunciation of the respective word selected from the plural alternate pronunciations based on the identified first spoken pronunciation;
  
  recognizing, by the at least one processor, a second keyword in the speech input spoken by the first speaker, wherein the second keyword contains the first portion;
  
  identifying, by the at least one processor, a second spoken pronunciation for the first portion based at least in part, on how the first speaker pronounced the second keyword in the speech input; and
  
  constructing the first speaker-specific recognition vocabulary by including, for each of the words in the first group of words, a second recognition pronunciation selected from the plural alternate pronunciations based on the identified second spoken pronunciation.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The at least one non-transitory computer readable medium of claim 7, comprising selecting, as the first recognition pronunciation, one of the plural alternate pronunciations based on comparing the first spoken pronunciation to a corresponding portion of each of the plural alternate pronunciations.
  - 9. The at least one non-transitory computer readable medium of claim 8, wherein comparing includes at least one of dynamic time warping, implementing a Viterbi algorithm, and implementing hidden Markov models.
  - 10. The at least one non-transitory computer readable medium of claim 8, further comprising generating adaptation rules based upon the selected first recognition pronunciation, wherein the adaptation rules facilitate constructing the first speaker-specific recognition vocabulary.
  - 11. The at least one non-transitory computer readable medium of claim 7, wherein the speech input is received from the first speaker reading an enrollment script containing at least the first keyword provided to the first speaker by a speech recognition system.
  - 12. The at least one non-transitory computer readable medium of claim 7, wherein the speech input is received from the first speaker utilizing a speech recognition system to recognize the speech input.

13. An apparatus configured to construct at least one speaker-specific recognition vocabulary from a speaker-independent recognition vocabulary that comprises a first group of words, wherein each word in the first group of words contains a first portion associated with plural alternate pronunciations in the speaker-independent recognition vocabulary for the respective word, the apparatus comprising:
- at least one processor configured to;
  
  recognize a first keyword in speech input spoken by a first speaker, wherein the first keyword contains the first portion;
  
  identify a first spoken pronunciation for the first portion based, at least in part, on how the first speaker pronounced the first keyword in the speech input; and
  
  construct a first speaker-specific recognition vocabulary by including, for each of the words in the first group of words, a first recognition pronunciation of the respective word selected from the plural alternate pronunciations based on the identified first spoken pronunciation;
  
  recognize a second keyword in the speech input spoken by the first speaker, wherein the second keyword contains the first portion;
  
  identify a second spoken pronunciation for the first portion based, at least in part, on how the first speaker pronounced the second keyword in the speech input; and
  
  construct the first speaker-specific recognition vocabulary by including, for each of the words in the first group of words, a second recognition pronunciation selected from the plural alternate pronunciations based on the identified second spoken pronunciation.
- View Dependent Claims (14, 15, 16)
- - 14. The apparatus of claim 13, wherein the at least one processor is configured to select, as the first recognition pronunciation, one of the plural alternate pronunciations based on comparing the first spoken pronunciation to a corresponding portion of each of the plural alternate pronunciations.
  - 15. The apparatus of claim 14, wherein the at least one processor is configured to implement at least one of dynamic time warping, implementing a Viterbi algorithm, and implementing hidden Markov models.
  - 16. The apparatus of claim 13, wherein the speech input is received from the first speaker reading an enrollment script containing at least the first keyword provided to the first speaker by a speech recognition system.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Rajput, Nitendra, Verma, Ashish
Primary Examiner(s)
Shah, Paras D

Application Number

US13/834,324
Publication Number

US 20130204621A1
Time in Patent Office

431 Days
Field of Search

704/231, 704/243, 704/254, 704/251, 704/247, 704/236
US Class Current

704/254
CPC Class Codes

G10L 15/07 to the speaker

G10L 17/02 Preprocessing operations, e...

Speaker adaptation of vocabulary for speech recognition

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

35 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Speaker adaptation of vocabulary for speech recognition

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

35 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links