Semi-supervised speaker adaptation
First Claim
1. Method to perform an unsupervised and/or on-line adaptation of an automatic speech recognition system, comprising:
- receiving an utterance or parts thereof;
determining a recognition result of said received utterance or parts thereof;
determining a grade of the reliability of the recognition result;
determining a grade of adaptation of the system with the help of the received utterance or parts thereof based on the grade of the reliability of the recognition result; and
adapting the automatic speech recognition system according to the grade of adaptation;
wherein the grade of adaptation indicates a weight to be applied in adapting the system such that adapting the automatic speech recognition system includes applying an amount of adaptation according to the grade of adaptation, wherein said received utterance or a part of said received utterance is used for adaptation when the grade of the reliability of the recognition is above a threshold and is discarded when it is below said threshold, and wherein said threshold is dynamically changeable.
1 Assignment
0 Petitions
Accused Products
Abstract
To prevent adaptation to misrecognized words in unsupervised or on-line automatic speech recognition systems confidence measures are used or the user reaction is interpreted to decide whether a recognized phoneme, several phonemes, a word, several words or a whole utterance should be used for adaptation of the speaker independent model set to a speaker adapted model set or not and, in case an adaptation is executed, how strong the adaptation with this recognized utterance or part of this recognized utterance should be performed. Furtheron, a verification of the speaker adaptation performance is proposed to secure that the recognition rate never decreases (significantly), but only increases or stays at the same level.
132 Citations
30 Claims
-
1. Method to perform an unsupervised and/or on-line adaptation of an automatic speech recognition system, comprising:
-
receiving an utterance or parts thereof;
determining a recognition result of said received utterance or parts thereof;
determining a grade of the reliability of the recognition result;
determining a grade of adaptation of the system with the help of the received utterance or parts thereof based on the grade of the reliability of the recognition result; and
adapting the automatic speech recognition system according to the grade of adaptation;
wherein the grade of adaptation indicates a weight to be applied in adapting the system such that adapting the automatic speech recognition system includes applying an amount of adaptation according to the grade of adaptation, wherein said received utterance or a part of said received utterance is used for adaptation when the grade of the reliability of the recognition is above a threshold and is discarded when it is below said threshold, and wherein said threshold is dynamically changeable. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 27, 28)
performing adaptation of the system with the help of a received utterance of parts thereof by repeatedly adapting a set of parameters; and
storing at least one set of earlier parameters to replace the currently used parameters in case the recognition performance of the system drops;
wherein the recognition performance of the system is judged on the basis of the method defined in claim 1.
-
-
14. Method according to claim 1, characterized in that the adaptation of the system is performed using the adaptation of Hidden Markov Models.
-
15. Method according to claim 14, further comprising adapting a speaker independent Hidden Markov Model towards the performance of a speaker dependent Hidden Markov Model by adjusting parameters of the speaker independent Hidden Markov Model.
-
27. Method according to claim 2, characterized in that said reactions are determined by interpretation of secondary information of utterances or parts of utterances received after said received utterance or parts of said received utterance.
-
28. Method according to claim 27, characterized in that said secondary information of utterances or parts of utterances received after said received utterance or parts of said received utterance is intonation and/or prosody of said utterances or parts of utterances received after said received utterance or parts of said received utterance.
-
16. Method to perform an unsupervised or on-line adaptation of an automatic speech recognition system, comprising:
-
performing adaptation of the system with the help of a received utterance or parts thereof by repeatedly adapting a set of parameters; and
storing at least one set of earlier parameters to replace the currently used parameters in case the recognition performance of the system drops;
wherein adapting the automatic speech recognition system includes determining a recognition result of a received utterance or parts thereof, determining a grade of the reliability of the recognition result, determining a grade of adaptation of the system with the help of the received utterance or parts thereof based on the grade of the reliability of the recognition result, and applying an amount of adaptation according to the grade of adaptation, wherein said received utterance or a part of said received utterance is used for adaptation when the grade of the reliability of the recognition is above a threshold and is discarded when it is below said threshold, and wherein said threshold is dynamically changeable. - View Dependent Claims (17, 18)
-
-
19. Speech recognition system with unsupervised and/or on-line adaptation, comprising:
-
a microphone (1) to receive spoken words of a user and to output an analog signal;
an A/D conversion stage (2) connected to said microphone (1) to convert said analog signal into a digital signal;
a feature extraction module (3) connected to said A/D conversion stage (2) to extract feature vectors of said received words of the user from said digital signal;
a recognition module (4) connected to said feature extraction module (3) to recognize said received words of the user on basis of said feature vectors and a set of speaker independent and/or speaker adapted models;
an adaptation module (7) receiving the recognition result from said recognition module (4) to generate and/or adapt said speaker adapted model set;
wherein a decision unit (11) that is connected to said recognition module (4) and that supplies a signal to said adaptation module (7) indicating whether to use a certain received word for generation and/or adaptation of the speaker adapted model set or not;
wherein said signal supplied to said adaptation module (7) from said decision unit (11) indicates a weight to be applied as the strength of adaptation of the speaker adapted model set by said adaptation module (7) on the basis of said certain received words;
wherein said decision unit (11) determines a grade of reliability for said certain received word and said certain received word is used for adaptation when the grade of the reliability of the recognition is above a threshold and is discarded when it is below said threshold; and
wherein said threshold is dynamically changeable. - View Dependent Claims (20, 21, 22, 23, 24)
-
-
25. Method to perform an unsupervised and/or on-line adaptation of an automatic speech recognition system, comprising:
-
receiving an utterance or parts thereof;
determining a recognition result of said received utterance or parts thereof;
determining a grade of the reliability of the recognition result;
determining a grade of adaptation of the system with the help of the received utterance or parts thereof based on the grade of the reliability of the recognition result; and
adapting the automatic speech recognition system according to the grade of adaptation;
wherein the grade of adaptation indicates a weight to be applied in adapting the system such that adapting the automatic speech recognition system includes applying an amount of adaptation according to the grade of adaptation, wherein the grade of the reliability of the recognition result of said received utterance or a part of said received utterance is measured on the basis of reactions of the speaker of said utterance, and wherein said reactions are determined via a visual computation system based on a picture or video sequence taken from the user and/or the user'"'"'s face.
-
-
26. Method to perform an unsupervised and/or on-line adaptation of an automatic speech recognition system, comprising:
-
receiving an utterance or parts thereof;
determining a recognition result of said received utterance or parts thereof;
determining a grade of the reliability of the recognition result;
determining a grade of adaptation of the system with the help of the received utterance or parts thereof based on the grade of the reliability of the recognition result; and
adapting the automatic speech recognition system according to the grade of adaptation;
wherein the grade of adaptation indicates a weight to be applied in adapting the system such that adapting the automatic speech recognition system includes applying an amount of adaptation according to the grade of adaptation, wherein the grade of the reliability of the recognition result of said received utterance or a part of said received utterance is measured on the basis of reactions of the speaker of said utterance, and wherein the grade of the reliability of the recognition result of said received utterance or a part of said received utterance is measured on the basis of confidence measures, and said confidence measures depend on the emotional state of the person speaking said utterance.
-
-
29. Speech recognition system with unsupervised and/or on-line adaptation, comprising:
-
a microphone to receive spoken words of a user and to output an analog signal;
an A/D conversion stage connected to said microphone to convert said analog signal into a digital signal;
a feature extraction module connected to said A/D conversion stage to extract feature vectors of said received words of the user from said digital signal;
a recognition module connected to said feature extraction module to recognize said received words of the user on basis of said feature vectors and a set of speaker independent and/or speaker adapted models;
an adaptation module receiving the recognition result from said recognition module to generate and/or adapt said speaker adapted model set;
wherein a decision unit that is connected to said recognition module and that supplies a signal to said adaptation module indicating whether to use a certain received word for generation and/or adaptation of the speaker adapted model set or not, wherein said signal supplied to said adaptation module from said decision unit indicates a weight to be applied as the strength of adaptation of the speaker adapted model set by said adaptation module on the basis of said certain received words, and wherein said signal supplied to said adaptation module from said decision unit is created on basis of a first control signal generated by a prosody extraction module connected in-between said recognition module and said decision unit.
-
-
30. Speech recognition system with unsupervised and/or on-line adaptation, comprising:
-
a microphone to receive spoken words of a user and to output an analog signal;
an A/D conversion stage connected to said microphone to convert said analog signal into a digital signal;
a feature extraction module connected to said A/D conversion stage to extract feature vectors of said received words of the user from said digital signal;
a recognition module connected to said feature extraction module to recognize said received words of the user on basis of said feature vectors and a set of speaker independent and/or speaker adapted models;
an adaptation module receiving the recognition result from said recognition module to generate and/or adapt said speaker adapted model set;
wherein a decision unit that is connected to said recognition module and that supplies a signal to said adaptation module indicating whether to use a certain received word for generation and/or adaptation of the speaker adapted model set or not, wherein said signal supplied to said adaptation module from said decision unit indicates a weight to be applied as the strength of adaptation of the speaker adapted model set by said adaptation module on the basis of said certain received words, and wherein said signal supplied to said adaptation module from said decision unit is created on basis of a fifth control signal generated by a vision module connected to said decision unit.
-
Specification