Task-independent utterance verification with subword-based minimum verification error training

US 6,292,778 B1
Filed: 10/30/1998
Issued: 09/18/2001
Est. Priority Date: 10/30/1998
Status: Expired due to Term

First Claim

Patent Images

1. An automated speech recognition system comprising:

a speech information preprocessor for receiving a speech signal and responsively producing at least one speech feature signal descriptive of said speech signal;

a speech recognition component responsive to said speech feature signal to produce a speech recognition hypothesis indicating which member of a predefined group of sound units most probably corresponds to speech content of said speech signal; and

an utterance verification component responsive to said speech recognition hypothesis and said speech feature signal for producing an acceptance signal when said speech content of said speech signal includes said speech recognition hypothesis;

said utterance verification component having a first subword acoustic model distinct from any acoustic model used to produce said speech recognition hypothesis for determining a first likelihood that a speech segment contains a sound corresponding to said speech recognition hypothesis;

said utterance verification component having a second subword acoustic model for determining a second likelihood that a speech segment contains a sound other than one corresponding to said speech recognition hypothesis;

wherein said first and second subword acoustic models have been trained through obtaining correct recognitions by force-segmenting sample sentences using a correct lexical transcription, obtaining incorrect recognitions by force-segmenting sample sentences using an incorrect lexical transcription, and for each subword in a sample sentence adjusting parameters of each of said first and second subword acoustic models according to whether recognition of such subword was correct.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An automated speech recognition system comprises a preprocessor, a speech recognizer, and a task-independent utterance verifier. The task independent utterance verifier employs a first subword acoustic Hidden Markov Model for determining a first likelihood that a speech segment contains a sound corresponding to a speech recognition hypothesis, and a second anti-subword acoustic Hidden Markov Model for determining a second likelihood that a speech segment contains a sound other than one corresponding to the speech recognition hypothesis. In operation, the utterance verifier employs the subword and anti-subword models to produce for each recognized subword in the input speech the first and second likelihoods. The utterance verifier determines a subword verification score as the log of the ratio of the first and second likelihoods. In order to verify larger speech units, the utterance verifier combines the subword verification scores to produce a word/phrase/sentence verification score, and compares that score to a predetermined threshold. The first and second verification-specific HMMs are discriminatively trained using a subword-based minimum verification error training technique.

220 Citations

24 Claims

1. An automated speech recognition system comprising:
- a speech information preprocessor for receiving a speech signal and responsively producing at least one speech feature signal descriptive of said speech signal;
  
  a speech recognition component responsive to said speech feature signal to produce a speech recognition hypothesis indicating which member of a predefined group of sound units most probably corresponds to speech content of said speech signal; and
  
  an utterance verification component responsive to said speech recognition hypothesis and said speech feature signal for producing an acceptance signal when said speech content of said speech signal includes said speech recognition hypothesis;
  
  said utterance verification component having a first subword acoustic model distinct from any acoustic model used to produce said speech recognition hypothesis for determining a first likelihood that a speech segment contains a sound corresponding to said speech recognition hypothesis;
  
  said utterance verification component having a second subword acoustic model for determining a second likelihood that a speech segment contains a sound other than one corresponding to said speech recognition hypothesis;
  
  wherein said first and second subword acoustic models have been trained through obtaining correct recognitions by force-segmenting sample sentences using a correct lexical transcription, obtaining incorrect recognitions by force-segmenting sample sentences using an incorrect lexical transcription, and for each subword in a sample sentence adjusting parameters of each of said first and second subword acoustic models according to whether recognition of such subword was correct.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The system of claim 1 wherein at least one of said first and second subword acoustic models is prepared using minimum verification error training.
  - 3. The system of claim 1 wherein said first and second subword acoustic models are prepared using discriminative training.
  - 4. The system of claim 1 wherein said first and second subword acoustic models are Hidden Markov Models prepared using discriminative subword-based minimum verification error training.
  - 5. The system of claim 1 wherein said utterance verification component further comprises a subword verification score generator responsive to said first and second likelihoods to determine for a subword-sized speech segment a subword verification score indicating whether speech content of said speech segment includes a speech content equivalent to a speech recognition hypothesis produced by said speech recognition component and corresponding to said speech segment.
  - 6. The system of claim 1 wherein said utterance verification component further comprises a combiner responsive to a plurality of subword verification scores for producing a larger-speech-unit verification score indicative of the extent to which speech content of speech segments corresponding to said plurality of subword verification scores is equivalent to a speech larger-speech-unit-sized recognition hypothesis produced by said speech recognition component and corresponding to said speech segments.
  - 7. The system of claim 6 wherein said utterance verification component further comprises a threshold component responsive to said larger-speech-unit verification score and a predetermined threshold to produce an acceptance signal indicating when said threshold is satisfied that said speech content of said speech segments corresponding to said plurality of subword verification scores is equivalent to said larger-speech-unit-sized speech recognition hypothesis produced by said speech recognition component and corresponding to said speech segments.

8. An utterance verification system for use in speech recognition comprising:
- means for receiving at least one speech feature signal descriptive of an acquired speech signal;
  
  means for receiving a speech recognition hypothesis indicating which member of a predefined group of sound units most probably corresponds to speech content of said speech signal;
  
  a first subword acoustic model distinct from any acoustic model used to produce said speech recognition hypothesis for determining a first likelihood that a speech segment contains a sound corresponding to said speech recognition hypothesis;
  
  a second subword acoustic model for determining a second likelihood that a speech segment contains a sound other than one corresponding to said speech recognition hypothesis; and
  
  means responsive to said speech feature signal, said speech recognition hypothesis, and said first and second subword acoustic models for producing an acceptance signal when said speech content of said speech signal includes said speech recognition hypothesis;
  
  wherein said first and second subword acoustic models have been trained through obtaining correct recognitions by force-segmenting sample sentences using a correct lexical transcription, obtaining incorrect recognitions by force-segmenting sample sentences using an incorrect lexical transcription, and for each subword in a sample sentence adjusting parameters of each of said first and second subword acoustic models according to whether recognition of such subword was correct.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8 wherein at least one of said first and second subword acoustic models is prepared using minimum verification error training.
  - 10. The system of claim 8 wherein said first and second subword acoustic models are prepared using discriminative training.
  - 11. The system of claim 8 wherein said first and second subword acoustic models are Hidden Markov Models prepared using discriminative subword-based minimum verification error training.
  - 12. The system of claim 8 further comprising:
13. The system of claim 8 further comprising:
- a combiner responsive to a plurality of subword verification scores for producing a larger-speech-unit verification score indicative of the extent to which speech content of speech segments corresponding to said plurality of subword verification scores is equivalent to a larger-speech-unit-sized speech recognition hypothesis corresponding to said speech segments.
14. The system of claim 13 further comprising:
- a threshold component responsive to said larger-speech-unit verification score and a predetermined threshold to produce an acceptance signal indicating when said threshold is satisfied said speech content of said speech segments corresponding to said plurality of subword verification scores is equivalent to said larger-speech-unit-sized speech recognition hypothesis corresponding to said speech segments.

15. A method for task-independent speech recognition comprising the steps of:
- receiving a speech signal;
  
  processing said speech signal into feature signals descriptive of speech content of said speech signal;
  
  processing said feature signals to produce a speech recognition hypothesis;
  
  producing for each subword contained in said speech recognition hypothesis a subword verification score, including processing each speech segment corresponding to each such subword using a first subword acoustic Hidden Markov Model, which is distinct from any acoustic model used in producing said speech recognition hypothesis, to produce a first subword verification likelihood and a second anti-subword acoustic Hidden Markov Model to produce a second subword verification likelihood;
  
  combining subword verification scores corresponding to all subwords contained in said speech recognition hypothesis to form a larger-speech-unit verification score;
  
  comparing said larger-speech-unit verification score to a predefined threshold, and if said threshold is satisfied, producing an accept signal indicating that the speech recognition hypothesis is contained in said speech signal;
  
  wherein said first subword and said second anti-subword acoustic models have been trained through obtaining correct recognitions by force-segmenting sample sentences using a correct lexical transcription, obtaining incorrect recognitions by force-segmenting sample sentences using an incorrect lexical transcription, and for each subword in a sample sentence adjusting parameters of each of said first and second subword acoustic models according to whether recognition of such subword was correct.
- View Dependent Claims (16, 17, 18)
- - 16. The method of claim 15 further comprising the step of delivering as an output said speech recognition hypothesis.
  - 17. The method of claim 15 further comprising the step of producing, if said threshold is not satisfied, a reject signal indicating that the speech recognition hypothesis is not contained in said speech signal.
  - 18. The method of claim 15 wherein said step of producing said subword verification score further comprises the step of determining said subword verification score as a log likelihood ratio of said first subword verification likelihood to said second subword verification likelihood.

19. A method of training a task-independent utterance verification system comprising the steps of:
- preparing a verification model set having a verification-specific subword model and a verification-specific anti-subword model for each subword in a recognizer subword set;
  
  using said verification model set, determining for a selected subword as the log of the ratio of the likelihood produced from said verification-specific subword model to the likelihood produced from said verification-specific anti-subword model;
  
  determining parameters of said verification model set by discriminatively training said verification-specific subword model and said verification-specific anti-subword model;
  
  wherein said parameter determining step further comprises the steps of;
  
  obtaining a training set comprising many sample utterances including subwords in the recognizer subword set;
  
  obtaining correct recognitions by force-segmenting sample sentences using a correct lexical transcription;
  
  obtaining incorrect recognitions by force-segmenting a sample sentence using a random lexical transcription;
  
  for each subword in a sample sentence, determining whether the recognition of said subword is correct; and
  
  adjusting parameters of said verification model set accordingly.
- View Dependent Claims (20, 21)
- - 20. The method of claim 19 wherein said parameter adjusting step further comprising the step of:
21. The method of claim 19 wherein said parameter adjusting step further comprising the step of:
- if the recognition of said subword is incorrect, adjusting parameters of said verification model set to maximize a ratio of a likelihood produced by said verification-specific anti-subword model to a likelihood produced by said verification-specific subword model for such subword.

22. An automated speech recognition system comprising:
- a speech information preprocessor for receiving a speech signal and responsively producing at least one speech feature signal descriptive of said speech signal;
  
  a speech recognition component responsive to said speech feature signal to produce a speech recognition hypothesis indicating which member of a predefined group of sound units most probably corresponds to speech content of said speech signal; and
  
  an utterance verification component responsive to said speech recognition hypothesis and said speech feature signal for producing an acceptance signal when said speech content of said speech signal includes said speech recognition hypothesis;
  
  said utterance verification component having a first subword acoustic model distinct from any acoustic model used to produce said speech recognition hypothesis for determining a first likelihood that a speech segment contains a sound corresponding to said speech recognition hypothesis;
  
  said utterance verification component having a second subword acoustic model for determining a second likelihood that a speech segment contains a sound other than one corresponding to said speech recognition hypothesis;
  
  and further comprising means for training said first and second subword acoustic models, said means for training having;
  
  means for obtaining correct recognitions by force-segmenting sample sentences using a correct lexical transcription;
  
  means for obtaining incorrect recognitions by force-segmenting sample sentences using an incorrect lexical transcription; and
  
  means for determining for each subword in a sample sentence whether recognition of such subword was correct and adjusting parameters of each of said first and second subword acoustic models accordingly.

23. An utterance verification system for use in speech recognition comprising:
- means for receiving at least one speech feature signal descriptive of an acquired speech signal;
  
  means for receiving a speech recognition hypothesis indicating which member of a predefined group of sound units most probably corresponds to speech content of said speech signal;
  
  a first subword acoustic model distinct from any acoustic model used to produce said speech recognition hypothesis for determining a first likelihood that a speech segment contains a sound corresponding to said speech recognition hypothesis;
  
  a second subword acoustic model for determining a second likelihood that a speech segment contains a sound other than one corresponding to said speech recognition hypothesis;
  
  means responsive to said speech feature signal, said speech recognition hypothesis, and said first and second subword acoustic models for producing an acceptance signal when said speech content of said speech signal includes said speech recognition hypothesis; and
  
  means for training said first and second subword acoustic models, including;
  
  means for obtaining correct recognitions by force-segmenting sample sentences using a correct lexical transcription;
  
  means for obtaining incorrect recognitions by force-segmenting sample sentences using an incorrect lexical transcription; and
  
  means for determining for each subword in a sample sentence whether recognition of such subword was correct and adjusting parameters of each of said first and second subword acoustic models accordingly.

24. A method of training a task-independent utterance verification system comprising the steps of:
- preparing a verification model set having a verification-specific subword model and a verification-specific anti-subword model for each subword in a recognizer subword set;
  
  determining parameters of said verification model set by training said verification-specific subword model and said verification-specific anti-subword model;
  
  wherein said parameter determining step further comprises the steps of;
  
  obtaining a training set comprising many sample utterances including subwords in the recognizer subword set;
  
  obtaining correct recognitions by force-segmenting sample sentences using a correct lexical transcription;
  
  obtaining incorrect recognitions by force-segmenting a sample sentence using an incorrect lexical transcription;
  
  for each subword in a sample sentence, determining whether the recognition of said subword is correct; and
  
  adjusting parameters of said verification model set accordingly.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
WSOU Investments, LLC (WSOU Holdings, LLC)
Original Assignee
Lucent Technologies, Inc. (Nokia Corporation)
Inventors
Sukkar, Rafid Antoon
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
ARMSTRONG, ANGELA A

Application Number

US09/183,720
Time in Patent Office

1,054 Days
Field of Search

704/240, 704/254, 704/255, 704/256, 704/249, 704/250
US Class Current

704/256.4
CPC Class Codes

G10L 15/08 Speech classification or se...

G10L 15/144 Training of HMMs

Task-independent utterance verification with subword-based minimum verification error training

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

220 Citations

24 Claims

Specification

Use Cases

Quick Links

Others

Task-independent utterance verification with subword-based minimum verification error training

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

220 Citations

24 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others