Automatic retraining of a speech recognizer while using reliable transcripts

US 6,374,221 B1
Filed: 06/22/1999
Issued: 04/16/2002
Est. Priority Date: 06/22/1999
Status: Expired due to Term

First Claim

Patent Images

1. A method for automatic retraining of a speech recognizer during its normal operation, in which speech recognizer a plurality of trained models is stored, the method comprising:

a) extracting a first feature vector sequence from a sampled input stream of a first user utterance, b) statistically matching the first feature vector sequence with the stored models so as to obtain likelihood scores for each model, while storing model state transitions, c) identifying the model with the highest likelihood score as a first tentative recognition result, d) storing the first feature vector sequence and the first tentative recognition result, e) informing the user, upon acceptance of the first tentative recognition result, about the first tentative recognition result, f) determining from the user'"'"'s behavior, when the user successively operates the speech recognizer, whether the first tentative recognition result was correct, and g) retraining a model corresponding to the first recognition result, using the stored first feature vector sequence, if the first tentative recognition result was correct.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Automatic retraining of a speech recognizer during its normal operation in conjunction with an electronic device responsive to the speech recognizer is addressed. In this retraining, stored trained models are retrained on the basis of recognized user utterances. Feature vectors, model state transitions, and tentative recognition results are stored upon processing and evaluation of speech samples of the user utterances. A reliable transcript is determined for later adaptation of a speech model, in dependence upon the user'"'"'s successive behavior when interacting with the speech recognizer and the electronic device. For example, in a name dialing process, such a behavior can be manual or voice re-dialing of the same number or dialing of a different phone number, immediately aborting an established communication, or braking it after a short period of time. In dependence upon such a behavior, a transcript is select in correspondence to a user'"'"'s first utterance or in correspondence to a user'"'"'s second utterance, or both, or the tentative recognition result (or results) are determined to be uncertain and deemed not to be suitable for updating a model, or updating is performed with appropriate weighting to take into effect the level of uncertainty. Upon determination of a reliable transcript, a model adaptation is performed.

219 Citations

24 Claims

1. A method for automatic retraining of a speech recognizer during its normal operation, in which speech recognizer a plurality of trained models is stored, the method comprising:
- a) extracting a first feature vector sequence from a sampled input stream of a first user utterance, b) statistically matching the first feature vector sequence with the stored models so as to obtain likelihood scores for each model, while storing model state transitions, c) identifying the model with the highest likelihood score as a first tentative recognition result, d) storing the first feature vector sequence and the first tentative recognition result, e) informing the user, upon acceptance of the first tentative recognition result, about the first tentative recognition result, f) determining from the user'"'"'s behavior, when the user successively operates the speech recognizer, whether the first tentative recognition result was correct, and g) retraining a model corresponding to the first recognition result, using the stored first feature vector sequence, if the first tentative recognition result was correct.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, further comprising, if it was determined that the first tentative recognition result was not correct, retraining of another model corresponding to a second recognition result if the user inputs a second utterance of which a second tentative recognition result differs from the first tentative recognition result, whereby the second tentative recognition result is determined in correspondence with steps a) to d) of claim 1, rather than retraining the model corresponding to the second tentative recognition result.
  - 3. The method of claim 2, further comprising, upon retraining of said retraining of another model, retraining of the model corresponding to the second tentative recognition result, retraining being performed on the basis of a second stored feature vector sequence, which is obtained from the second utterance, and back tracking information obtained from the stored model state transitions.
  - 4. The method of claim 2, for use with the speech recognizer coupled to an electronic device responsive to a recognized user utterance of the speech recognizer, further comprising refraining from updating a model if a response from the electronic device to the recognized user utterance is interrupted by the user within a predetermined period of time.
  - 5. The method of claim 1, for use with the speech recognizer coupled to an electronic device responsive to a recognized user utterance of the speech recognizer, further comprising refraining from updating a model if a response from the electronic device to the recognized user utterance is interrupted by the user within a predetermined period of time.
  - 6. The method of claim 1, further comprising weighted retraining of the model, a first weighting factor being applied to stored model parameters and a second weighting factor being applied to an estimate of the model parameters obtained from the first or second tentative recognition result.
  - 7. The method of claim 6, further comprising requesting the user to confirm whether the first user utterance is the same as the second user utterance, and refraining from retraining if the user indicates that the second user utterance differs from the first user utterance.

8. A method for automatic retraining of a speech recognizer during its normal operation, in which speech recognizer a plurality of trained models is stored, the method comprising:
- a) extracting a first feature vector sequence from a sampled input stream of a first user utterance, b) statistically matching the first feature vector sequence with the stored models so as to obtain likelihood scores for each model, while storing model state transitions, c) identifying the model with the highest likelihood score as a first tentative recognition result, d) storing the first feature vector sequence and the first tentative recognition result, e) informing the user, upon acceptance of the first tentative recognition result, about the first tentative recognition result, and f) upon rejection of the first tentative recognition result, prompting the user to input a second utterance, of the same transcript as the first utterance, of which second utterance a second tentative recognition result is different from the first tentative recognition result, whereby the second tentative recognition result is determined in correspondence with steps a) to d), and retraining the model corresponding to the second tentative recognition result, rather than retraining the model corresponding to the first tentative recognition result, retraining being performed on the basis of the stored first feature vector sequence and back tracking information obtained from the stored model state transitions.
- View Dependent Claims (9)
- - 9. The method of claim 8, for use with the speech recognizer coupled to an electronic device responsive to a recognized user utterance of the speech recognizer, further comprising refraining from updating a model if a response from the electronic device to the recognized user utterance is interrupted by the user within a predetermined period of time.

10. A voice operated system comprising a speech recognizer to be retrained during its normal operation, in which speech recognizer a plurality of trained models is stored, and an electronic device responsive to a recognized utterance of the user, the speech recognizer comprising:
- a) means for extracting a first feature vector sequence from a sampled input stream of a first user utterance, b) means for statistically matching the first feature vector sequence with the stored models so as to obtain likelihood scores for each model, while storing model state transitions, c) means for identifying the model with the highest likelihood score as a first tentative recognition result, d) means for storing the first feature vector sequence and the first tentative recognition result, e) means for informing the user, upon acceptance of the first tentative recognition result, about the first tentative recognition result, f) means for determining from the user'"'"'s behavior, when the user successively operates the speech recognizer, whether the first tentative recognition result was correct, and g) means for retraining a model corresponding to the first tentative recognition, using the stored first feature vector sequence, retraining being performed if it was determined that the first recognition result was correct.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 11. The system of claim 10, further comprising, if it was determined that the first tentative recognition result was not correct, means for retraining of another model corresponding to a second recognition result if the user inputs a second utterance of which a second tentative recognition result differs from the first tentative recognition result, whereby the second tentative recognition result is determined in correspondence with steps a) to d) of claim 10, rather than retraining the model corresponding to the second tentative recognition result.
  - 12. The system of claim 11, further comprising, upon retraining of said retraining of another model, means for retraining the model corresponding to the second tentative recognition result, retraining being performed on the basis of a second stored feature vector sequence, which is obtained from the second utterance, and back tracking information obtained from the stored model state transitions.
  - 13. The system of claim 11, wherein the speech recognizer further comprises means for refraining from updating a model if a response from the electronic device to the recognized user utterance is interrupted by the user within a predetermined period of time.
  - 14. The system of claim 10, wherein the speech recognizer further comprises means for refraining from updating a model if a response from the electronic device to the recognized user utterance is interrupted by the user within a predetermined period of time.
  - 15. The system of claim 10, wherein the speech recognizer further comprises means for weighted retraining of the model, a first weighting factor being applied to stored model parameters and a second weighting factor being applied to an estimate of the model parameters obtained from the first or second tentative recognition result.
  - 16. The system of claim 15, further comprising means for requesting the user to confirm whether the first user utterance is the same as the second user utterance, the speech recognizer comprising means for refraining from retraining if the user indicates that the second user utterance differs from the first user utterance.
  - 17. The system of claim 10, wherein said system is comprised of an arrangement comprising both the speech recognizer and the electronic device responsive to the speech recognizer.
  - 18. The system of claim 17, wherein the arrangement is a communication device.
  - 19. The system of claim 10, wherein the arrangement is a communication device.
  - 20. The system of claim 10, wherein system is further comprised of a telephone network and a plurality of telephone apparatuses, the speech recognizer being comprised in the network and the electronic device being a telephone apparatus, whereby the user is an operator of the electronic device capable of communicating by speech with the speech recognizer.

21. A voice operated system comprising an speech recognizer to be retrained during its normal operation and an electronic device responsive to a recognized user utterance of a user of the system, in which speech recognizer a plurality of trained models is stored, and an electronic device responsive to a recognized utterance of the user, the speech recognizer comprising:
- a) means for extracting a first feature vector sequence from a sampled input stream of a first user utterance, b) means for statistically matching the first feature vector sequence with the stored models so as to obtain likelihood scores for each model, while storing model state transitions, c) means for identifying the model with the highest likelihood score as a first tentative recognition result, d) means for storing the first feature vector sequence and the first tentative recognition result, e) means for informing the user, upon acceptance of the first tentative recognition result, about the first tentative recognition result, and f) upon rejection of the first tentative recognition result, means for prompting the user to input a second utterance, of the same transcript as the first utterance, of which second utterance a second tentative recognition result is different from the first tentative recognition result, whereby the second tentative recognition result is determined in correspondence with steps a) to d), and means for retraining the model corresponding to the second tentative recognition result, rather than retraining the model corresponding to the first tentative recognition result, retraining being performed on the basis of the stored first feature vector sequence and back tracking information obtained from the stored model state transitions.
- View Dependent Claims (22, 23, 24)
- - 22. The system of claim 21, further comprising means for refraining from updating a model if a response from the electronic device to the recognized user utterance is interrupted by the user within a predetermined period of time.
  - 23. The system of claim 21, wherein said system is comprised of an arrangement comprising both the speech recognizer and the electronic device responsive to the speech recognizer.
  - 24. The system of claim 21, wherein said system is comprised of a telephone network and a plurality of telephone apparatuses, the speech recognizer being comprised in the network and the electronic device being a telephone apparatus, whereby the user an operator of the electronic device capable of communicating with the speech recognizer.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
WSOU Investments, LLC (WSOU Holdings, LLC)
Original Assignee
Lucent Technologies, Inc. (Nokia Corporation)
Inventors
Haimi-Cohen, Raziel
Primary Examiner(s)
Dorvil, Richemond

Application Number

US09/337,229
Time in Patent Office

1,029 Days
Field of Search

704/200, 704/251, 704/231, 704/240, 704/242, 704/256, 704/243, 704/247, 704/270.1, 704/270, 704/275, 704/250, 704/255, 704/239, 704/244, 704/252
US Class Current

704/256.1
CPC Class Codes

G10L 15/063   Training

G10L 2015/0631   Creating reference template...

G10L 2015/0635   updating or merging of old ...

Automatic retraining of a speech recognizer while using reliable transcripts

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

219 Citations

24 Claims

Specification

Use Cases

Quick Links

Others

Automatic retraining of a speech recognizer while using reliable transcripts

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

219 Citations

24 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others