Method and apparatus for discriminative utterance verification using multiple confidence measures

US 6,125,345 A
Filed: 09/19/1997
Issued: 09/26/2000
Est. Priority Date: 09/19/1997
Status: Expired due to Term

First Claim

Patent Images

1. An automated speech recognition system comprising:

a plurality of confidence measurement generating devices that generate a plurality of confidence measurements, at least one of the plurality of confidence measurements being a first type of confidence measurement and at least one of the plurality of confidence measurements being a second type of confidence measurement, where the first and second types of confidence measurements are different types of confidence measurements, the first and second types of confidence measurements corresponding to separate knowledge sources, wherein;

the automated speech recognition system inputs a signal representing an utterance to be recognized, the signal comprising at least one portion, andeach confidence measurement generating device inputs that signal and outputs at least one confidence measurement of at least one of at least the first and second types of confidence measurements for each portion of that signal;

a normalizing device that inputs the plurality of confidence measurements comprising at least the first and second types of confidence measurements and outputs a plurality of normalized confidence measurements of at least the first and second types for each portion of the utterance; and

an integrator that inputs, for each portion of the utterance, the plurality of normalized confidence measurements of at least the first and second types and outputs, based on the plurality of normalized confidence measurements of at least the first and second types for that portion of the utterance, a signal indicating whether that portion of the utterance has been correctly recognized.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A multiple confidence measures subsystem of an automated speech recognition system allows otherwise independent confidence measures to be integrated and used for both training and testing on a consistent basis. Speech to be recognized is input to a speech recognizer and a recognition verifier of the multiple confidence measures subsystem. The speech recognizer generates one or more confidence measures. The speech recognizer preferably generates a misclassification error (MCE) distance as one of the confidence measures. The recognized speech output by the speech recognizer is input to the recognition verifier, which outputs one or more confidence measures. The recognition verifier preferably outputs a misverification error (MVE) distance as one of the confidence measures. The confidence measures output by the speech recognizer and the recognition verifier are normalized and then input to an integrator. The integrator integrates the various confidence measures during both a training phase for the hidden Markov models implemented in the speech recognizer and the recognition verifier and during testing of the input speech. The integrator is preferably implemented using a multi-layer perceptron (MLP). The output of the integrator, rather than the recognition verifier, determines whether the recognized utterance hypothesis generated by the speech recognizer should be accepted or rejected.

92 Citations

View as Search Results

23 Claims

1. An automated speech recognition system comprising:
- a plurality of confidence measurement generating devices that generate a plurality of confidence measurements, at least one of the plurality of confidence measurements being a first type of confidence measurement and at least one of the plurality of confidence measurements being a second type of confidence measurement, where the first and second types of confidence measurements are different types of confidence measurements, the first and second types of confidence measurements corresponding to separate knowledge sources, wherein;
  
  the automated speech recognition system inputs a signal representing an utterance to be recognized, the signal comprising at least one portion, andeach confidence measurement generating device inputs that signal and outputs at least one confidence measurement of at least one of at least the first and second types of confidence measurements for each portion of that signal;
  
  a normalizing device that inputs the plurality of confidence measurements comprising at least the first and second types of confidence measurements and outputs a plurality of normalized confidence measurements of at least the first and second types for each portion of the utterance; and
  
  an integrator that inputs, for each portion of the utterance, the plurality of normalized confidence measurements of at least the first and second types and outputs, based on the plurality of normalized confidence measurements of at least the first and second types for that portion of the utterance, a signal indicating whether that portion of the utterance has been correctly recognized.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The automated speech recognition system of claim 1, wherein the integrator is a multi-layer neural network.
  - 3. The automated speech recognition system of claim 2, wherein the multi-layer neural network is a multi-layer perceptron.
  - 4. The automated speech recognition system of claim 3, wherein the multi-layer perceptron comprises:
    - a first layer having a number n of nodes equal to a number n of the plurality of normalized confidence measurements, each first layer node inputting the n normalized confidence measurements;
      
      at least one hidden layer, each hidden layer having at least one node, each node of a first hidden layer connected to the n first layer nodes, the nodes of each next hidden layer connected to the nodes of the preceding hidden layer; and
      
      an output layer having a single node connected to each node of a last hidden layer.
  - 5. The automated speech recognition system of claim 4, wherein the at least one hidden layer of the multi-layer perceptron comprises a single layer acting as both the first hidden layer and the last hidden layer.
  - 6. The automated speech recognition system of claim 3, wherein the multi-layer perceptron comprises:
    - a first layer having a number n of nodes equal to a number n of the plurality of normalized confidence measurements, each first layer node inputting the n normalized confidence measurements;
      
      a second layer having a number m of nodes, each second layer node connected to the n first layer nodes; and
      
      a third layer having a single node connected to each of the m second layer nodes.
  - 7. The automated speech recognition system of claim 1, wherein the normalizer normalizes each of the plurality of confidence measurements by a confidence measurement statistic to generate the normalized confidence measurements.
  - 8. The automated speech recognition system of claim 1, wherein the plurality of confidence measurement generating devices comprises:
    - a speech recognizer, wherein the signal input by the speech recognizer comprises a spectral coefficient signal including a plurality of spectral coefficients determined from the portion of the utterance, the speech recognizer outputting, for the portion of the utterance, a recognition hypothesis and at least a minimum characterization error confidence measurement for the hypothesis; and
      
      a recognition verifier, wherein the signal input by the recognition verifier comprises at least the recognition hypothesis output by the speech recognizer, the recognition verifier outputting, for the portion of the speech, at least a minimum verification error confidence measurement for the recognition hypothesis.
  - 9. The automated speech recognition system of claim 1, wherein each confidence measurement generating device contains knowledge about the recognition task performed by the automated speech recognition system.
  - 10. The automated speech recognition system of claim 1, wherein:
    - each confidence measurement corresponds to a quality of a recognition function performed by the corresponding confidence measurement generating device on the portion of the utterance, each recognition function having a set of parameters; and
      
      during a training phase, the signal output by the integrator is input to the plurality of confidence measurement generating devices, each confidence measurement generating device modifying the set of parameters of its recognition function based on the signal.

11. A method for automatically recognizing speech, comprising:
- inputting a signal based on an utterance to be recognized, the signal having a plurality of portions;
  
  generating a proposed recognition for each portion;
  
  generating for each portion of the signal, a plurality of confidence measurements from the proposed recognition for that portion of the signal, comprising;
  
  generating at least one confidence measurement of a first type, andgenerating at least one confidence measurement of a second type,wherein at least the first and second types of confidence measurements are different types of confidence measurements that correspond to separate knowledge sources and the at least first and second types of confidence measurements are generated in parallel relative to the signal;
  
  normalizing, for each portion of the signal, the plurality of confidence measurements, comprising at least the first and second types of confidence measurements, for that portion of the signal;
  
  integrating, for each portion of the signal, the plurality of normalized confidence measurements of at least the first and second types to generate an integrated confidence measurement for that portion of the signal; and
  
  determining, for each portion of the signal, if the proposed recognition for that portion of the signal is acceptable based on the integrated confidence measurements of at least the first and second types.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
- - 12. The method of claim 11, wherein generating the proposed recognition for each portion comprises generating a recognition hypothesis for each portion.
  - 13. The method of claim 12, wherein generating, for each portion of the signal, the first type of confidence measurements comprises generating at least a minimum characterization error type of confidence measurement based on the recognition hypothesis.
  - 14. The method of claim 13, wherein generating, for each portion of the signal, the second type of confidence measurements comprisesgenerating at least one alternative recognition hypothesis for that portion of the signal;
    - andgenerating at least a minimum verification error type of confidence measurement based on the recognition hypothesis and the at least one alternative recognition hypothesis.
  - 15. The method of claim 11, wherein normalizing the plurality of confidence measurements for each portion comprises computing at least one confidence measurement statistic to generate the normalized confidence measurements.
  - 16. The method of claim 15, further comprising dynamically determining the range of the plurality of confidence measurements.
  - 17. The method of claim 11, wherein integrating the plurality of normalized confidence measurements comprises inputting the plurality of normalized confidence measurements into a multi-layer perceptron.
  - 18. The method of claim 11, wherein integrating the plurality of normalized confidence measurements comprises inputting the plurality of normalized confidence measurements into a multi-layer neural network.
  - 19. The method of claim 11, wherein integrating the plurality of normalized confidence measurements comprises:
    - inputting the plurality of normalized confidence measurements into a first layer of a multi-layer perceptron, the first layer having a number n of nodes equal to a number n of the plurality of normalized confidence measurements, each first layer node inputting the n normalized confidence measurements and generating a first layer output signal;
      
      inputting the n first layer output signals into a first one of a plurality of hidden layers of the multi-layer perceptron, each hidden layer having an arbitrary number of nodes, each next hidden layer connected to a preceding hidden layer, a last hidden layer outputting m hidden layer output signals; and
      
      inputting the m hidden layer output signals into an output layer having a single node, the single node inputting the m hidden layer nodes and outputting an acceptance signal indicating if the proposed recognition is acceptable.
  - 20. The method of claim 11, wherein integrating the plurality of normalized confidence measurements comprises:
    - inputting the plurality of normalized confidence measurements into a first layer of a multi-layer perceptron, the first layer having a number n of nodes equal to a number n of the plurality of normalized confidence measurements, each first layer node inputting the n normalized confidence measurements and generating a first layer output signal;
      
      inputting the n first layer output signals into a second layer of a multi-layer perceptron, the second layer having a number m of nodes, each second layer node inputting the n first layer nodes and outputting a second layer output signal; and
      
      inputting the m second layer output signals into a third layer having a single node, the single node inputting the m second layer signals and outputting an acceptance signal indicating if the proposed recognition is acceptable.
  - 21. The method of claim 11, wherein generating the plurality of confidence measurements comprises generating the plurality of confidence measurements from the proposed recognition based on a plurality of sets of parameters, the method further comprising, during a training phase, modifying for each portion, at least one of the plurality of sets of parameters based on a correctness of the proposed recognition.

22. An automated speech recognition system comprising:
- a plurality of confidence measurement generating devices, each confidence measurement generating device inputting a signal based on a portion of an utterance to be recognized and outputting at least one confidence measurement for the portion of the utterance, each confidence measurement corresponding to a quality of a recognition function performed by the corresponding confidence measurement generating device on the portion of the utterance, each recognition function having a set of parameter, the plurality of confidence measurement generating devices comprising;
  
  a speech recognizer, wherein the signal input by the speech recognizer comprises a spectral coefficient signal including a plurality of spectral coefficients determined from the portion of the utterance, the speech recognizer outputting, for the portion of the utterance, a recognition hypothesis and at least a minimum characterization error confidence measurement for the hypothesis, anda recognition verifier, wherein the signal input by the recognition verifier comprises at least the recognition hypothesis output by the speech recognizer, the recognition verifier outputting, for the portion of the speech, at least a minimum verification error confidence measurement for the recognition hypothesis;
  
  a normalizing device that inputs the plurality of confidence measurements from both the speech recognizer and the recognition verifier and outputs a plurality of normalized confidence measurements; and
  
  an integrator that inputs the plurality of normalized confidence measurements of both the speech recognizer and the recognition verifier and outputs a signal indicating whether the portion of the utterance has been correctly recognized,wherein, during a training phase, the signal output by the integrator is input to the plurality of confidence measurement generating devices, each confidence measurement generating device modifying the set of parameters of its recognition function based on the signal.

23. A method for automatically recognizing speech, comprising:
- inputting a signal based on an utterance to be recognized, the signal having a plurality of portions;
  
  generating a proposed recognition for each portion, comprising generating a recognition hypothesis for each portion;
  
  generating, for each portion of the signal, a plurality of distinct confidence measurements from the proposed recognition based on a plurality of sets of parameters;
  
  normalizing the plurality of distinct confidence measurements for each portion;
  
  integrating the plurality of normalized distinct confidence measurements to determine, for each portion, if the proposed recognition is acceptable; and
  
  during a training phase, modifying, for each portion, at least one of the plurality of sets of parameters based on a correctness of the proposed recognition;
  
  wherein generating, for each portion of the signal, the plurality of distinct confidence measurements comprises;
  
  generating at least a minimum characterization error confidence measurement based on the recognition hypothesis,generating at least one alternative recognition hypothesis for the recognition hypothesis, andgenerating at least a minimum verification error confidence measurement based on the recognition hypothesis and the at least one alternative recognition hypothesis.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
AT&T Corporation (AT&T, Inc.)
Inventors
Modi, Piyush C., Rahim, Mazin G.
Primary Examiner(s)
Zele, Krista
Assistant Examiner(s)
SAX, ROBERT L

Application Number

US08/934,056
Time in Patent Office

1,103 Days
Field of Search

704/240, 704/236, 704/232
US Class Current

704/240
CPC Class Codes

G10L 15/10 using distance or distortio...

Method and apparatus for discriminative utterance verification using multiple confidence measures

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

92 Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for discriminative utterance verification using multiple confidence measures

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

92 Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links