Speaker adaptation of vocabulary for speech recognition
First Claim
1. A method for constructing at least one speaker-specific recognition vocabulary from a speaker-independent recognition vocabulary that comprises a first group of words, wherein each word in the first group of words contains a first portion associated with plural alternate pronunciations in the speaker-independent recognition vocabulary for the respective word, the method comprising:
- recognizing, by at least one processor, a first keyword in speech input spoken by a first speaker, wherein the first keyword contains the first portion;
identifying, by the at least one processor, a first spoken pronunciation for the first portion based, at least in part, on how the first speaker pronounced the first keyword in the speech input;
constructing a first speaker-specific recognition vocabulary by including, for each of the words in the first group of words, a first recognition pronunciation of the respective word selected from the plural alternate pronunciations based on the identified first spoken pronunciation;
recognizing, by the at least one processor, a second keyword in the speech input spoken by the first speaker, wherein the second keyword contains the first portion;
identifying, by the at least one processor, a second spoken pronunciation for the first portion based, at least in part, on how the first speaker pronounced the second keyword in the speech input; and
constructing the first speaker-specific recognition vocabulary by including, for each of the words in the first group of words, a second recognition pronunciation selected from the plural alternate pronunciations based on the identified second spoken pronunciation.
2 Assignments
0 Petitions
Accused Products
Abstract
A phonetic vocabulary for a speech recognition system is adapted to a particular speaker'"'"'s pronunciation. A speaker can be attributed specific pronunciation styles, which can be identified from specific pronunciation examples. Consequently, a phonetic vocabulary can be reduced in size, which can improve recognition accuracy and recognition speed.
35 Citations
17 Claims
-
1. A method for constructing at least one speaker-specific recognition vocabulary from a speaker-independent recognition vocabulary that comprises a first group of words, wherein each word in the first group of words contains a first portion associated with plural alternate pronunciations in the speaker-independent recognition vocabulary for the respective word, the method comprising:
-
recognizing, by at least one processor, a first keyword in speech input spoken by a first speaker, wherein the first keyword contains the first portion; identifying, by the at least one processor, a first spoken pronunciation for the first portion based, at least in part, on how the first speaker pronounced the first keyword in the speech input; constructing a first speaker-specific recognition vocabulary by including, for each of the words in the first group of words, a first recognition pronunciation of the respective word selected from the plural alternate pronunciations based on the identified first spoken pronunciation; recognizing, by the at least one processor, a second keyword in the speech input spoken by the first speaker, wherein the second keyword contains the first portion; identifying, by the at least one processor, a second spoken pronunciation for the first portion based, at least in part, on how the first speaker pronounced the second keyword in the speech input; and constructing the first speaker-specific recognition vocabulary by including, for each of the words in the first group of words, a second recognition pronunciation selected from the plural alternate pronunciations based on the identified second spoken pronunciation. - View Dependent Claims (2, 3, 4, 5, 6, 17)
-
-
7. At least one non-transitory computer readable medium comprising instructions that, when executed by at least one processor, perform a method for constructing at least one speaker-specific recognition vocabulary from a speaker-independent recognition vocabulary that comprises a first group of words, wherein each word in the first group of words contains a first portion associated with plural alternate pronunciations in the speaker-independent recognition vocabulary for the respective word, the method comprising:
-
recognizing a first keyword in speech input spoken by a first speaker, wherein the first keyword contains the first portion; identifying a first spoken pronunciation for the first portion based, at least in part, on how the first speaker pronounced the first keyword in the speech input; constructing a first speaker-specific recognition vocabulary by including, for each of the words in the first group of words, a first recognition pronunciation of the respective word selected from the plural alternate pronunciations based on the identified first spoken pronunciation; recognizing, by the at least one processor, a second keyword in the speech input spoken by the first speaker, wherein the second keyword contains the first portion; identifying, by the at least one processor, a second spoken pronunciation for the first portion based at least in part, on how the first speaker pronounced the second keyword in the speech input; and constructing the first speaker-specific recognition vocabulary by including, for each of the words in the first group of words, a second recognition pronunciation selected from the plural alternate pronunciations based on the identified second spoken pronunciation. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. An apparatus configured to construct at least one speaker-specific recognition vocabulary from a speaker-independent recognition vocabulary that comprises a first group of words, wherein each word in the first group of words contains a first portion associated with plural alternate pronunciations in the speaker-independent recognition vocabulary for the respective word, the apparatus comprising:
at least one processor configured to; recognize a first keyword in speech input spoken by a first speaker, wherein the first keyword contains the first portion; identify a first spoken pronunciation for the first portion based, at least in part, on how the first speaker pronounced the first keyword in the speech input; and construct a first speaker-specific recognition vocabulary by including, for each of the words in the first group of words, a first recognition pronunciation of the respective word selected from the plural alternate pronunciations based on the identified first spoken pronunciation; recognize a second keyword in the speech input spoken by the first speaker, wherein the second keyword contains the first portion; identify a second spoken pronunciation for the first portion based, at least in part, on how the first speaker pronounced the second keyword in the speech input; and construct the first speaker-specific recognition vocabulary by including, for each of the words in the first group of words, a second recognition pronunciation selected from the plural alternate pronunciations based on the identified second spoken pronunciation. - View Dependent Claims (14, 15, 16)
Specification