Method and apparatus for fast on-line automatic speaker/environment adaptation for speech/speaker recognition in the presence of changing environments
First Claim
1. A method for fast on-line automatic speaker/environment adaptation suitable for speech/speaker recognition in the presence of changing environmental conditions, the method comprising acts of:
- performing front-end processing on an acoustic input signal, wherein the front-end processing generates MEL frequency cepstral features representative of the acoustic input signal;
performing recognition and adaptation by;
providing the MEL frequency cepstral features to a speech recognizer, wherein the speech recognizer utilizes the MEL frequency cepstral features and a current list of acoustic training models to determine at least one best hypothesis;
receiving, from the speech recognizer, at least one best hypothesis, associated acoustic training models, and associated probabilities;
computing a pre-adaptation acoustic score by recognizing an utterance using the associated acoustic training models;
choosing acoustic training models from the associated acoustic training models;
performing adaptation on the chosen associated acoustic training models;
computing a post-adaptation acoustic score by recognizing the utterance using the adapted acoustic training models;
comparing the pre-adaptation acoustic score with the post-adaptation acoustic score to check for improvement;
modifying the current list of acoustic training models to include the adapted acoustic training models, if the acoustic score improved after performing adaptation; and
performing recognition and adaptation iteratively until the acoustic score ceases to improve;
choosing the best hypothesis as recognized words once the acoustic score ceases to improve; and
outputting the recognized words.
1 Assignment
0 Petitions
Accused Products
Abstract
A fast on-line automatic speaker/environment adaptation suitable for speech/speaker recognition system, method and computer program product are presented. The system comprises a computer system including a processor, a memory coupled with the processor, an input coupled with the processor for receiving acoustic signals, and an output coupled with the processor for outputting recognized words or sounds. The system includes a model-adaptation system and a recognition system, configured to accurately and efficiently recognize on-line distorted sounds or words spoken with different accents, in the presence of randomly changing environmental conditions. The model-adaptation system quickly adapts standard acoustic training models, available on audio recognition systems, by incorporating distortion parameters representative of the changing environmental conditions or the speaker'"'"'s accent. By adapting models already available to the new environment, the system does not need separate adaptation training data.
-
Citations
123 Claims
-
1. A method for fast on-line automatic speaker/environment adaptation suitable for speech/speaker recognition in the presence of changing environmental conditions, the method comprising acts of:
-
performing front-end processing on an acoustic input signal, wherein the front-end processing generates MEL frequency cepstral features representative of the acoustic input signal;
performing recognition and adaptation by;
providing the MEL frequency cepstral features to a speech recognizer, wherein the speech recognizer utilizes the MEL frequency cepstral features and a current list of acoustic training models to determine at least one best hypothesis;
receiving, from the speech recognizer, at least one best hypothesis, associated acoustic training models, and associated probabilities;
computing a pre-adaptation acoustic score by recognizing an utterance using the associated acoustic training models;
choosing acoustic training models from the associated acoustic training models;
performing adaptation on the chosen associated acoustic training models;
computing a post-adaptation acoustic score by recognizing the utterance using the adapted acoustic training models;
comparing the pre-adaptation acoustic score with the post-adaptation acoustic score to check for improvement;
modifying the current list of acoustic training models to include the adapted acoustic training models, if the acoustic score improved after performing adaptation; and
performing recognition and adaptation iteratively until the acoustic score ceases to improve;
choosing the best hypothesis as recognized words once the acoustic score ceases to improve; and
outputting the recognized words. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41)
-
-
42. A system for fast on-line automatic speaker/environment adaptation suitable for speech/speaker recognition in the presence of changing environmental conditions, the system comprising:
-
a computer system including a processor, a memory coupled with the processor, an input coupled with the processor for receiving an acoustic input signal, the computer system further comprising means, residing in its processor and memory for;
performing front-end processing on the acoustic input signal, wherein the front-end processing generates MEL frequency cepstral features representative of the acoustic input signal;
performing recognition and adaptation by;
providing the MEL frequency cepstral features to a speech recognizer, wherein the speech recognizer utilizes the MEL frequency cepstral features and a current list of acoustic training models to determine at least one best hypothesis;
receiving, from the speech recognizer, at least one best hypothesis, associated acoustic training models, and associated probabilities;
computing a pre-adaptation acoustic score by recognizing an utterance using the associated acoustic training models;
choosing acoustic training models from the associated acoustic training models;
performing adaptation on the chosen associated acoustic training models;
computing a post-adaptation acoustic score by recognizing the utterance using the adapted acoustic training models;
comparing the pre-adaptation acoustic score with the post-adaptation acoustic score to check for improvement;
modifying the current list of acoustic training models to include the adapted acoustic training models, if the acoustic score improved after performing adaptation;
and performing recognition and adaptation iteratively until the acoustic score ceases to improve;
choosing the best hypothesis as recognized words once the acoustic score ceases to improve; and
outputting the recognized words. - View Dependent Claims (43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82)
-
-
83. A computer program product for fast on-line automatic speaker/environment adaptation suitable for speech/speaker recognition in the presence of changing environmental conditions, the computer program product comprising means, stored on a computer readable medium for:
-
receiving an acoustic input signal;
performing front-end processing on the acoustic input signal, wherein the front-end processing generates MEL frequency cepstral features representative of the acoustic input signal;
performing recognition and adaptation by;
providing the MEL frequency cepstral features to a speech recognizer, wherein the speech recognizer utilizes the MEL frequency cepstral features and a current list of acoustic training models to determine at least one best hypothesis;
receiving, from the speech recognizer, at least one best hypothesis, associated acoustic training models, and associated probabilities;
computing a pre-adaptation acoustic score by recognizing the utterance using the associated acoustic training models;
choosing acoustic training models from the associated acoustic training models;
performing adaptation on the chosen associated acoustic training models;
computing a post-adaptation acoustic score by recognizing the utterance using the adapted acoustic training models;
comparing the pre-adaptation acoustic score with the post-adaptation acoustic score to check for improvement;
modifying the current list of acoustic training models to include the adapted acoustic training models, if the acoustic score improved after performing adaptation; and
performing recognition and adaptation iteratively until the acoustic score ceases to improve;
choosing the best hypothesis as recognized words once the acoustic score ceases to improve; and
outputting the recognized words. - View Dependent Claims (84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123)
-
Specification