Task-independent utterance verification with subword-based minimum verification error training
First Claim
1. An automated speech recognition system comprising:
- a speech information preprocessor for receiving a speech signal and responsively producing at least one speech feature signal descriptive of said speech signal;
a speech recognition component responsive to said speech feature signal to produce a speech recognition hypothesis indicating which member of a predefined group of sound units most probably corresponds to speech content of said speech signal; and
an utterance verification component responsive to said speech recognition hypothesis and said speech feature signal for producing an acceptance signal when said speech content of said speech signal includes said speech recognition hypothesis;
said utterance verification component having a first subword acoustic model distinct from any acoustic model used to produce said speech recognition hypothesis for determining a first likelihood that a speech segment contains a sound corresponding to said speech recognition hypothesis;
said utterance verification component having a second subword acoustic model for determining a second likelihood that a speech segment contains a sound other than one corresponding to said speech recognition hypothesis;
wherein said first and second subword acoustic models have been trained through obtaining correct recognitions by force-segmenting sample sentences using a correct lexical transcription, obtaining incorrect recognitions by force-segmenting sample sentences using an incorrect lexical transcription, and for each subword in a sample sentence adjusting parameters of each of said first and second subword acoustic models according to whether recognition of such subword was correct.
8 Assignments
0 Petitions
Accused Products
Abstract
An automated speech recognition system comprises a preprocessor, a speech recognizer, and a task-independent utterance verifier. The task independent utterance verifier employs a first subword acoustic Hidden Markov Model for determining a first likelihood that a speech segment contains a sound corresponding to a speech recognition hypothesis, and a second anti-subword acoustic Hidden Markov Model for determining a second likelihood that a speech segment contains a sound other than one corresponding to the speech recognition hypothesis. In operation, the utterance verifier employs the subword and anti-subword models to produce for each recognized subword in the input speech the first and second likelihoods. The utterance verifier determines a subword verification score as the log of the ratio of the first and second likelihoods. In order to verify larger speech units, the utterance verifier combines the subword verification scores to produce a word/phrase/sentence verification score, and compares that score to a predetermined threshold. The first and second verification-specific HMMs are discriminatively trained using a subword-based minimum verification error training technique.
220 Citations
24 Claims
-
1. An automated speech recognition system comprising:
-
a speech information preprocessor for receiving a speech signal and responsively producing at least one speech feature signal descriptive of said speech signal;
a speech recognition component responsive to said speech feature signal to produce a speech recognition hypothesis indicating which member of a predefined group of sound units most probably corresponds to speech content of said speech signal; and
an utterance verification component responsive to said speech recognition hypothesis and said speech feature signal for producing an acceptance signal when said speech content of said speech signal includes said speech recognition hypothesis;
said utterance verification component having a first subword acoustic model distinct from any acoustic model used to produce said speech recognition hypothesis for determining a first likelihood that a speech segment contains a sound corresponding to said speech recognition hypothesis;
said utterance verification component having a second subword acoustic model for determining a second likelihood that a speech segment contains a sound other than one corresponding to said speech recognition hypothesis;
wherein said first and second subword acoustic models have been trained through obtaining correct recognitions by force-segmenting sample sentences using a correct lexical transcription, obtaining incorrect recognitions by force-segmenting sample sentences using an incorrect lexical transcription, and for each subword in a sample sentence adjusting parameters of each of said first and second subword acoustic models according to whether recognition of such subword was correct. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. An utterance verification system for use in speech recognition comprising:
-
means for receiving at least one speech feature signal descriptive of an acquired speech signal;
means for receiving a speech recognition hypothesis indicating which member of a predefined group of sound units most probably corresponds to speech content of said speech signal;
a first subword acoustic model distinct from any acoustic model used to produce said speech recognition hypothesis for determining a first likelihood that a speech segment contains a sound corresponding to said speech recognition hypothesis;
a second subword acoustic model for determining a second likelihood that a speech segment contains a sound other than one corresponding to said speech recognition hypothesis; and
means responsive to said speech feature signal, said speech recognition hypothesis, and said first and second subword acoustic models for producing an acceptance signal when said speech content of said speech signal includes said speech recognition hypothesis;
wherein said first and second subword acoustic models have been trained through obtaining correct recognitions by force-segmenting sample sentences using a correct lexical transcription, obtaining incorrect recognitions by force-segmenting sample sentences using an incorrect lexical transcription, and for each subword in a sample sentence adjusting parameters of each of said first and second subword acoustic models according to whether recognition of such subword was correct. - View Dependent Claims (9, 10, 11, 12, 13, 14)
a subword verification score generator responsive to said first and second likelihoods to determine for a subword-sized speech segment a subword verification score indicating whether said speech segment includes a speech content equivalent to a subword-sized speech recognition hypothesis corresponding to said speech segment.
-
-
13. The system of claim 8 further comprising:
a combiner responsive to a plurality of subword verification scores for producing a larger-speech-unit verification score indicative of the extent to which speech content of speech segments corresponding to said plurality of subword verification scores is equivalent to a larger-speech-unit-sized speech recognition hypothesis corresponding to said speech segments.
-
14. The system of claim 13 further comprising:
a threshold component responsive to said larger-speech-unit verification score and a predetermined threshold to produce an acceptance signal indicating when said threshold is satisfied said speech content of said speech segments corresponding to said plurality of subword verification scores is equivalent to said larger-speech-unit-sized speech recognition hypothesis corresponding to said speech segments.
-
15. A method for task-independent speech recognition comprising the steps of:
-
receiving a speech signal;
processing said speech signal into feature signals descriptive of speech content of said speech signal;
processing said feature signals to produce a speech recognition hypothesis;
producing for each subword contained in said speech recognition hypothesis a subword verification score, including processing each speech segment corresponding to each such subword using a first subword acoustic Hidden Markov Model, which is distinct from any acoustic model used in producing said speech recognition hypothesis, to produce a first subword verification likelihood and a second anti-subword acoustic Hidden Markov Model to produce a second subword verification likelihood;
combining subword verification scores corresponding to all subwords contained in said speech recognition hypothesis to form a larger-speech-unit verification score;
comparing said larger-speech-unit verification score to a predefined threshold, and if said threshold is satisfied, producing an accept signal indicating that the speech recognition hypothesis is contained in said speech signal;
wherein said first subword and said second anti-subword acoustic models have been trained through obtaining correct recognitions by force-segmenting sample sentences using a correct lexical transcription, obtaining incorrect recognitions by force-segmenting sample sentences using an incorrect lexical transcription, and for each subword in a sample sentence adjusting parameters of each of said first and second subword acoustic models according to whether recognition of such subword was correct. - View Dependent Claims (16, 17, 18)
-
-
19. A method of training a task-independent utterance verification system comprising the steps of:
-
preparing a verification model set having a verification-specific subword model and a verification-specific anti-subword model for each subword in a recognizer subword set;
using said verification model set, determining for a selected subword as the log of the ratio of the likelihood produced from said verification-specific subword model to the likelihood produced from said verification-specific anti-subword model;
determining parameters of said verification model set by discriminatively training said verification-specific subword model and said verification-specific anti-subword model;
wherein said parameter determining step further comprises the steps of;
obtaining a training set comprising many sample utterances including subwords in the recognizer subword set;
obtaining correct recognitions by force-segmenting sample sentences using a correct lexical transcription;
obtaining incorrect recognitions by force-segmenting a sample sentence using a random lexical transcription;
for each subword in a sample sentence, determining whether the recognition of said subword is correct; and
adjusting parameters of said verification model set accordingly. - View Dependent Claims (20, 21)
if the recognition of said subword is correct, adjusting parameters of said verification model set to maximize a ratio of a likelihood produced by said verification-specific subword model to a likelihood produced by said verification-specific anti-subword model for such subword.
-
-
21. The method of claim 19 wherein said parameter adjusting step further comprising the step of:
if the recognition of said subword is incorrect, adjusting parameters of said verification model set to maximize a ratio of a likelihood produced by said verification-specific anti-subword model to a likelihood produced by said verification-specific subword model for such subword.
-
22. An automated speech recognition system comprising:
-
a speech information preprocessor for receiving a speech signal and responsively producing at least one speech feature signal descriptive of said speech signal;
a speech recognition component responsive to said speech feature signal to produce a speech recognition hypothesis indicating which member of a predefined group of sound units most probably corresponds to speech content of said speech signal; and
an utterance verification component responsive to said speech recognition hypothesis and said speech feature signal for producing an acceptance signal when said speech content of said speech signal includes said speech recognition hypothesis;
said utterance verification component having a first subword acoustic model distinct from any acoustic model used to produce said speech recognition hypothesis for determining a first likelihood that a speech segment contains a sound corresponding to said speech recognition hypothesis;
said utterance verification component having a second subword acoustic model for determining a second likelihood that a speech segment contains a sound other than one corresponding to said speech recognition hypothesis;
and further comprising means for training said first and second subword acoustic models, said means for training having;
means for obtaining correct recognitions by force-segmenting sample sentences using a correct lexical transcription;
means for obtaining incorrect recognitions by force-segmenting sample sentences using an incorrect lexical transcription; and
means for determining for each subword in a sample sentence whether recognition of such subword was correct and adjusting parameters of each of said first and second subword acoustic models accordingly.
-
-
23. An utterance verification system for use in speech recognition comprising:
-
means for receiving at least one speech feature signal descriptive of an acquired speech signal;
means for receiving a speech recognition hypothesis indicating which member of a predefined group of sound units most probably corresponds to speech content of said speech signal;
a first subword acoustic model distinct from any acoustic model used to produce said speech recognition hypothesis for determining a first likelihood that a speech segment contains a sound corresponding to said speech recognition hypothesis;
a second subword acoustic model for determining a second likelihood that a speech segment contains a sound other than one corresponding to said speech recognition hypothesis;
means responsive to said speech feature signal, said speech recognition hypothesis, and said first and second subword acoustic models for producing an acceptance signal when said speech content of said speech signal includes said speech recognition hypothesis; and
means for training said first and second subword acoustic models, including;
means for obtaining correct recognitions by force-segmenting sample sentences using a correct lexical transcription;
means for obtaining incorrect recognitions by force-segmenting sample sentences using an incorrect lexical transcription; and
means for determining for each subword in a sample sentence whether recognition of such subword was correct and adjusting parameters of each of said first and second subword acoustic models accordingly.
-
-
24. A method of training a task-independent utterance verification system comprising the steps of:
-
preparing a verification model set having a verification-specific subword model and a verification-specific anti-subword model for each subword in a recognizer subword set;
determining parameters of said verification model set by training said verification-specific subword model and said verification-specific anti-subword model;
wherein said parameter determining step further comprises the steps of;
obtaining a training set comprising many sample utterances including subwords in the recognizer subword set;
obtaining correct recognitions by force-segmenting sample sentences using a correct lexical transcription;
obtaining incorrect recognitions by force-segmenting a sample sentence using an incorrect lexical transcription;
for each subword in a sample sentence, determining whether the recognition of said subword is correct; and
adjusting parameters of said verification model set accordingly.
-
Specification