Rejection method for speech recognition

US 5,097,509 A
Filed: 03/28/1990
Issued: 03/17/1992
Est. Priority Date: 03/28/1990
Status: Expired due to Term

First Claim

Patent Images

1. A method for speech recognition comprising the steps of:

representing an unknown utterance as a first sequence of parameter frames, each parameter frame including a set of primary and secondary parameters and an equalized second sequence of parameter frames derived from the first sequence of parameter frames;

comparing each of the primary and secondary parameters in the sequence of parameter frames of the representation of the unknown utterance to each of a plurality of reference representations expressed in the same kind of parameters, to determine how closely each reference representation resembles the representation of the unknown utterance;

ranking the reference representations in order from best to worst choice in dependence upon their relative closeness to the representation of the unknown utterance, for each of the first and second sequences of parameters;

computing a probability that the best choice is a correct match for the unknown utterance; and

rejecting the best choice as a match for the unknown utterance if the probability is below a predetermined value.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognizer, for recognizing unknown utterances in isolated-word small-vocabulary speech has improved rejection of out of vocabulary utterances. Both a usual spectral representation including a dynamic component and an equalized representation are used to match unknown utterances to templates for in-vocabulary words. In a preferred embodiment, the representations are mel-based cepstral with dynamic components being signed vector differences between pairs of primary cepstra. The equalized representation being the signed difference of each cepstral coefficient less an average value of the coefficients. Factors are generated from the ordered lists of templates to determine the probability of the top choice being a correct acceptance, with different methods being applied when the usual and equalized representations yield a different match. For additional enhancement, the rejection method may use templates corresponding to non-vocabulary utterances or decoys. If the top choice corresponds to a decoy, the input is rejected.

Citations

19 Claims

1. A method for speech recognition comprising the steps of:
- representing an unknown utterance as a first sequence of parameter frames, each parameter frame including a set of primary and secondary parameters and an equalized second sequence of parameter frames derived from the first sequence of parameter frames;
  
  comparing each of the primary and secondary parameters in the sequence of parameter frames of the representation of the unknown utterance to each of a plurality of reference representations expressed in the same kind of parameters, to determine how closely each reference representation resembles the representation of the unknown utterance;
  
  ranking the reference representations in order from best to worst choice in dependence upon their relative closeness to the representation of the unknown utterance, for each of the first and second sequences of parameters;
  
  computing a probability that the best choice is a correct match for the unknown utterance; and
  
  rejecting the best choice as a match for the unknown utterance if the probability is below a predetermined value.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. A method as claimed in claim 1 wherein the second sequence of parameters is derived from the first sequence of parameters by the steps of computing an average value of the first sequence of parameters and taking a signed difference of each of the parameters of the first sequence of parameters less the average value.
  - 3. A method as claimed in claim 1 wherein the step of computing a probability includes the step of comparing the top choice for the first sequence of parameters to the top choice for the second sequence of parameters to determine whether there is agreement as to the top choice.
  - 4. A method as claimed in claim 3 wherein the step of comparing determines that there is agreement and wherein the step of computing a probability includes the steps of generating a set of factors based upon the ranking of the reference representations for the first and second sequences of parameters and using the set of factors to calculate the probability that the top choice is a correct match.
  - 5. A method as claimed in claim 3 wherein the step of comparing determines that there is disagreement and wherein the step of computing a probability includes the steps of generating two sets of factors based upon the ranking of the reference representations for the first and second sequences of parameters and using the two sets of factors to calculate respective probabilities that the top choice for each of the first and second sequence of parameters is a correct match.
  - 6. A method as claimed in claim 1 further comprising the steps of providing nonvocabulary reference representations among the reference representations and rejecting the top choice if it is a nonvocabulary reference representation.
  - 7. A method as claimed in claim 1 wherein the step of representing includes the steps of dividing the unknown utterance into time frames, filtering the time frames to provide a plurality of channels spanning a predetermined range of frequencies, computing cepstral coefficients to provide the set of primary parameters, C₁, . . . , C₇, detecting endpoints for the unknown utterance, computing a set of secondary parameters, Δ
    - C₁, . . . , Δ
      
      C₇, by determining signed differences between adjacent primary parameters, the sets of primary and secondary parameters forming the first sequence of parameter frames, and deriving the second sequence of parameters the first sequence of parameters by computing an average value of the first sequence of parameters and taking a signed difference of each of the parameters of the first sequence of parameters less the average value.
  - 8. A method as claimed in claim 1 wherein the step of comparing includes the step of computing dynamic time warping distances for reference representations for both first and second sequences of parameters frames.
  - 9. A method as claimed in claim 8 wherein the best choice has the smallest relative dynamic time warping distance.

10. Apparatus for speech recognition, comprising:
- means for representing an unknown utterance as a first sequence of parameter frames, each parameter frame including a set of primary and secondary parameters and an equalized second sequence of parameter frames derived from the first sequence of parameter frames;
  
  means for comparing each of the primary and secondary parameters in the sequence of parameter frames of the representation of the unknown utterance to each of a plurality of reference representations expressed in the same kind of parameters, to determine how closely each reference representation resembles the representation of the unknown utterance;
  
  means for ranking the reference representations in order from best to worst choice in dependence upon their relative closeness to the representation of the unknown utterance, for each of the first and second sequences of parameters;
  
  means for computing a probability that the best choice is a correct match for the unknown utterance; and
  
  means for rejecting the best choice as a match for the unknown utterance if the probability is below a predetermined value.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 11. Apparatus as claimed in claim 10 wherein the means for representing includes hamming window means for dividing the unknown utterance into time frames, filter bank means for filtering the time frames to provide a plurality of channels spanning a predetermined range of frequencies, coefficient generating means computing cepstral coefficients to provide the set of primary parameters, C₁, . . . , C₇, endpoint detecting means for detecting endpoints for the unknown utterance, and dynamic coefficient means for computing a set of secondary parameters, Δ
    - C₁, . . . , Δ
      
      C₇, by determining signed differences between adjacent primary parameters, the sets of primary and secondary parameters forming the first sequence of parameter frames, and deriving the second sequence of parameters the first sequence of parameters by computing an average value of the first sequence of parameters and taking a signed difference of each of the parameters of the first sequence of parameters less the average value.
  - 12. Apparatus as claimed in claim 10 wherein the means for comparing includes means for computing dynamic time warping distances for reference representation for both first and second sequences of parameter frames.
  - 13. Apparatus as claimed in claim 10 wherein the means for computing a probability includes factor generator means and probability computation means.
  - 14. Apparatus as claimed in claim 13 wherein in a case of agreement between the top choice for the first sequence of parameters to the top choice for the second sequence of parameters, the factors used to compute the probability that the top choice is correct are of the set {d_n, d_e, r_n, r_e, v_n, v_e, 1}.
  - 15. Apparatus as claimed in claim 14 wherein the probability is computed in accordance with:
    - space="preserve" listing-type="equation">P.sub.t =[P.sub.D.sbsb.n.sub.(d.sbsb.n.sub.) ].spsp.W.sup.D.sbsp.n ·
      
      [P.sub.D.sbsb.e.sub.(d.sbsb.e.sub.) ].spsp.W.sup.D.sbsp.e
      space="preserve" listing-type="equation"> ·
      
      [P.sub.R.sbsb.n.sub.(r.sbsb.n.sub.) ].spsp.W.sup.R.sbsp.n ·
      
      [P.sub.R.sbsb.e.sub.(r.sbsb.e.sub.) ].spsp.W.sup.R.sbsp.e ·
      
      [P.sub.V.sbsb.n.sub.(v.sbsb.n.sub.) ].spsp.W.sup.V.sbsp.n ·
      
      [P.sub.V.sbsb.e.sub.(v.sbsb.e.sub.) ].spsp.W.sup.V.sbsp.e ·
      
      [P.sub.L(1) ].spsp.W.sup.L
  - 16. Apparatus as claimed in claim 13 wherein in a case of disagreement between the top choice for the first sequence of parameters to the top choice for the second sequence of parameters, the factors used to compute the probability that the top choice for the first sequence of parameters is correct are of the set {d_n, d"_e, r_n, r"_e, v_n, v"_e, 1}.
  - 17. Apparatus as claimed in claim 16 wherein the probability is computed in accordance with:
    - space="preserve" listing-type="equation">P.sub.n =[P.sub.d.sbsb.n.sub.(d.sbsb.n.sub.) ].spsp.W.sup.D.sbsp.n ·
      
      [P.sub.D.sbsb.e.sub.(d".sbsb.e.sub.) ].spsp.W.sup.D.sbsp.e ti ·
      
      [P.sub.R.sbsb.n.sub.(r.sbsb.n.sub.) ].spsp.W.sup.R.sbsp.n ·
      
      [P.sub.R.sbsb.e.sub.(r".sbsb.e.sub.) ].spsp.W.sup.R.sbsp.e ·
      
      [P.sub.V.sbsb.n.sub.(v.sbsb.n.sub.) ].spsp.W.sup.V.sbsp.n ·
      
      [P.sub.V.sbsb.e.sub.(v".sbsb.e.sub.) ].spsp.W.sup.V.sbsp.e ·
      
      [P.sub.L(1) ].spsp.W.sup.L
  - 18. Apparatus as claimed in claim 13 wherein in a case of disagreement between the top choice for the first sequence of parameters to the top choice for the second sequence of parameters, the factors used to compute the probability that the top choice for the second sequence of parameters is correct are of the set {d"_n, d_e, r"_n, r_e, v"_n, v_e, 1}.
  - 19. Apparatus as claimed in claim 18 wherein the probability is computed in accordance with:
    - space="preserve" listing-type="equation">P.sub.e =[P.sub.D.sbsb.n.sub.(d".sbsb.n.sub.) ].spsp.W.sup.D.sbsp.n ·
      
      [P.sub.D.sbsb.e.sub.(d.sbsb.e.sub.) ].spsp.W.sup.D.sbsp.e
      space="preserve" listing-type="equation"> ·
      
      [P.sub.R.sbsb.n.sub.(r".sbsb.n.sub.) ].spsp.W.sup.R.sbsp.n ·
      
      [P.sub.R.sbsb.e.sub.(r.sbsb.e.sub.) ].spsp.W.sup.R.sbsp.e ·
      
      [P.sub.V.sbsb.n.sub.(v".sbsb.v.sub.) ].spsp.W.sup.V.sbsp.n
      space="preserve" listing-type="equation"> ·
      
      [P.sub.V.sbsb.e.sub.(v.sbsb.e.sub.) ].spsp.W.sup.V.sbsp.e ·
      
      [P.sub.L(1) ].spsp.W.sup.L

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nortel Networks Limited (Nortel Networks Corporation)
Original Assignee
Northern Telecom Limited (Nortel Networks Corporation)
Inventors
Lennig, Matthew
Primary Examiner(s)
KEMENY, EMANUEL

Application Number

US07/501,993
Time in Patent Office

720 Days
Field of Search

381/42-45
US Class Current

704/240
CPC Class Codes

G10L 15/08 Speech classification or se...

Rejection method for speech recognition

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Rejection method for speech recognition

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links