Automatic retraining of a speech recognizer while using reliable transcripts
First Claim
1. A method for automatic retraining of a speech recognizer during its normal operation, in which speech recognizer a plurality of trained models is stored, the method comprising:
- a) extracting a first feature vector sequence from a sampled input stream of a first user utterance, b) statistically matching the first feature vector sequence with the stored models so as to obtain likelihood scores for each model, while storing model state transitions, c) identifying the model with the highest likelihood score as a first tentative recognition result, d) storing the first feature vector sequence and the first tentative recognition result, e) informing the user, upon acceptance of the first tentative recognition result, about the first tentative recognition result, f) determining from the user'"'"'s behavior, when the user successively operates the speech recognizer, whether the first tentative recognition result was correct, and g) retraining a model corresponding to the first recognition result, using the stored first feature vector sequence, if the first tentative recognition result was correct.
8 Assignments
0 Petitions
Accused Products
Abstract
Automatic retraining of a speech recognizer during its normal operation in conjunction with an electronic device responsive to the speech recognizer is addressed. In this retraining, stored trained models are retrained on the basis of recognized user utterances. Feature vectors, model state transitions, and tentative recognition results are stored upon processing and evaluation of speech samples of the user utterances. A reliable transcript is determined for later adaptation of a speech model, in dependence upon the user'"'"'s successive behavior when interacting with the speech recognizer and the electronic device. For example, in a name dialing process, such a behavior can be manual or voice re-dialing of the same number or dialing of a different phone number, immediately aborting an established communication, or braking it after a short period of time. In dependence upon such a behavior, a transcript is select in correspondence to a user'"'"'s first utterance or in correspondence to a user'"'"'s second utterance, or both, or the tentative recognition result (or results) are determined to be uncertain and deemed not to be suitable for updating a model, or updating is performed with appropriate weighting to take into effect the level of uncertainty. Upon determination of a reliable transcript, a model adaptation is performed.
219 Citations
24 Claims
-
1. A method for automatic retraining of a speech recognizer during its normal operation, in which speech recognizer a plurality of trained models is stored, the method comprising:
-
a) extracting a first feature vector sequence from a sampled input stream of a first user utterance, b) statistically matching the first feature vector sequence with the stored models so as to obtain likelihood scores for each model, while storing model state transitions, c) identifying the model with the highest likelihood score as a first tentative recognition result, d) storing the first feature vector sequence and the first tentative recognition result, e) informing the user, upon acceptance of the first tentative recognition result, about the first tentative recognition result, f) determining from the user'"'"'s behavior, when the user successively operates the speech recognizer, whether the first tentative recognition result was correct, and g) retraining a model corresponding to the first recognition result, using the stored first feature vector sequence, if the first tentative recognition result was correct. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for automatic retraining of a speech recognizer during its normal operation, in which speech recognizer a plurality of trained models is stored, the method comprising:
-
a) extracting a first feature vector sequence from a sampled input stream of a first user utterance, b) statistically matching the first feature vector sequence with the stored models so as to obtain likelihood scores for each model, while storing model state transitions, c) identifying the model with the highest likelihood score as a first tentative recognition result, d) storing the first feature vector sequence and the first tentative recognition result, e) informing the user, upon acceptance of the first tentative recognition result, about the first tentative recognition result, and f) upon rejection of the first tentative recognition result, prompting the user to input a second utterance, of the same transcript as the first utterance, of which second utterance a second tentative recognition result is different from the first tentative recognition result, whereby the second tentative recognition result is determined in correspondence with steps a) to d), and retraining the model corresponding to the second tentative recognition result, rather than retraining the model corresponding to the first tentative recognition result, retraining being performed on the basis of the stored first feature vector sequence and back tracking information obtained from the stored model state transitions. - View Dependent Claims (9)
-
-
10. A voice operated system comprising a speech recognizer to be retrained during its normal operation, in which speech recognizer a plurality of trained models is stored, and an electronic device responsive to a recognized utterance of the user, the speech recognizer comprising:
-
a) means for extracting a first feature vector sequence from a sampled input stream of a first user utterance, b) means for statistically matching the first feature vector sequence with the stored models so as to obtain likelihood scores for each model, while storing model state transitions, c) means for identifying the model with the highest likelihood score as a first tentative recognition result, d) means for storing the first feature vector sequence and the first tentative recognition result, e) means for informing the user, upon acceptance of the first tentative recognition result, about the first tentative recognition result, f) means for determining from the user'"'"'s behavior, when the user successively operates the speech recognizer, whether the first tentative recognition result was correct, and g) means for retraining a model corresponding to the first tentative recognition, using the stored first feature vector sequence, retraining being performed if it was determined that the first recognition result was correct. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A voice operated system comprising an speech recognizer to be retrained during its normal operation and an electronic device responsive to a recognized user utterance of a user of the system, in which speech recognizer a plurality of trained models is stored, and an electronic device responsive to a recognized utterance of the user, the speech recognizer comprising:
-
a) means for extracting a first feature vector sequence from a sampled input stream of a first user utterance, b) means for statistically matching the first feature vector sequence with the stored models so as to obtain likelihood scores for each model, while storing model state transitions, c) means for identifying the model with the highest likelihood score as a first tentative recognition result, d) means for storing the first feature vector sequence and the first tentative recognition result, e) means for informing the user, upon acceptance of the first tentative recognition result, about the first tentative recognition result, and f) upon rejection of the first tentative recognition result, means for prompting the user to input a second utterance, of the same transcript as the first utterance, of which second utterance a second tentative recognition result is different from the first tentative recognition result, whereby the second tentative recognition result is determined in correspondence with steps a) to d), and means for retraining the model corresponding to the second tentative recognition result, rather than retraining the model corresponding to the first tentative recognition result, retraining being performed on the basis of the stored first feature vector sequence and back tracking information obtained from the stored model state transitions. - View Dependent Claims (22, 23, 24)
-
Specification