Technique for selective use of Gaussian kernels and mixture component weights of tied-mixture hidden Markov models for speech recognition

US 6,009,390 A
Filed: 09/11/1997
Issued: 12/28/1999
Est. Priority Date: 09/11/1997
Status: Expired due to Term

First Claim

Patent Images

1. A speech recognizer comprising:

a processor responsive to a representation of speech for deriving at least one state observation likelihood measure, each state observation likelihood measure being a function of at least a plurality of probability kernels and a plurality of weights associated therewith, one or more of the weights whose values are different from a selected constant value being set to the selected constant value in deriving the state observation likelihood measure; and

an output for generating signals representative of recognized speech based on the at least one state observation likelihood measure.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In a speech recognition system, tied-mixture hidden Markov models (HMMs) are used to match, in the maximum likelihood sense, the phonemes of spoken words given the acoustic input thereof. In a well known manner, such speech recognition requires computation of state observation likelihoods (SOLs). Because of the use of HMMs, each SOL computation involves a substantial number of Gaussian kernels and mixture component weights. In accordance with the invention, the number of Gaussian kernels is cut down to reduce the computational complexity and increase the efficiency of memory access to the kernels. For example, only the non-zero mixture component weights and the Gaussian kernels associated therewith are considered in the SOL computation. In accordance with an aspect of the invention, only a subset of the Gaussian kernels of significant values, regardless of the values of the associated mixture component weights, are considered in the SOL computation. In accordance with another aspect of the invention, at least some of the mixture component weights are quantized to reduce memory space needed to store them. As such, the computational complexity and memory access efficiency are further improved.

Citations

32 Claims

1. A speech recognizer comprising:
- a processor responsive to a representation of speech for deriving at least one state observation likelihood measure, each state observation likelihood measure being a function of at least a plurality of probability kernels and a plurality of weights associated therewith, one or more of the weights whose values are different from a selected constant value being set to the selected constant value in deriving the state observation likelihood measure; and
  
  an output for generating signals representative of recognized speech based on the at least one state observation likelihood measure.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The speech recognizer of claim 1 wherein the selected constant value is zero.
  - 3. The speech recognizer of claim 1 wherein the values of the one or more of the weights are non-zero, the selected constant value being an average of the values of the one or more of the weights.
  - 4. The speech recognizer of claim 1 wherein selected ones of the plurality of weights, other than the one or more of the weights, are set to at least one other selected constant value in deriving the state observation likelihood measure.
  - 5. The speech recognizer of claim 4 wherein the selected constant value and the at least one other selected constant value are selected in accordance with an LBG algorithm.
  - 6. The speech recognizer of claim 4 wherein the one or more of the weights are closer, in terms of a distance measure, to the selected constant value than to each of the at least one other selected constant value.
  - 7. The speech recognizer of claim 6 wherein said distance measure is an absolute value distance measure.
  - 8. The speech recognizer of claim 6 wherein said distance measure is asymmetric.
  - 9. The speech recognizer of claim 1 wherein said probability kernels are Gaussian kernels in accordance with tied-mixture hidden Markov models (HMMs).
  - 10. The speech recognizer of claim 1 wherein said weights are mixture component weights in accordance with tied-mixture HMMs.

11. A speech recognizer comprising:
- a processor responsive to a representation of speech for deriving at least one state observation likelihood measure, each state observation likelihood measure being a function of at least a plurality of probability kernels and a plurality of weights, each probability kernel being associated with a respective one of the weights, only the probability kernels associated with those weights whose values are non-zero being identified by respective indexes;
  
  a repository for providing each non-zero weight and an index identifying the probability kernel associated with the non-zero weight for deriving the at least one state observation likelihood measure; and
  
  an output for generating signals representative of recognized speech based on the at least one state observation likelihood measure.
- View Dependent Claims (12, 13)
- - 12. The speech recognizer of claim 11 wherein said probability kernels are Gaussian kernels in accordance with tied-mixture HMMs.
  - 13. The speech recognizer of claim 11 wherein said weights are mixture component weights in accordance with tied-mixture HMMs.

14. Apparatus for recognizing speech based on at least one state observation likelihood measure, each state observation likelihood measure being a function of at least a plurality of probability kernels and a plurality of weights associated therewith, comprising:
- a processor for deriving the at least one state observation likelihood measure based on a subset of the plurality of probability kernels, each probability kernel in the subset being a function of a representation of the speech to be recognized, the number of probability kernels in the subset being predetermined, the predetermined number being smaller than the number of the plurality of probability kernels, the probability kernels in the subset each having a larger value than any of the probability kernels outside the subset; and
  
  an output for generating signals representative of recognized speech based on the at least one state observation likelihood measure.
- View Dependent Claims (15, 16)
- - 15. The apparatus of claim 14 wherein said probability kernels are Gaussian kernels in accordance with tied-mixture HMMs.
  - 16. The apparatus of claim 14 wherein said weights are mixture component weights in accordance with tied-mixture HMMs.

17. A method for recognizing speech comprising:
- deriving at least one state observation likelihood measure in response to a representation of said speech, each state observation likelihood measure being a function of at least a plurality of probability kernels and a plurality of weights associated therewith, one or more of the weights whose values are different from a selected constant value being set to the selected constant value in deriving the state observation likelihood measure; and
  
  generating signals representative of recognized speech based on the at least one state observation likelihood measure.
- View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26)
- - 18. The method of claim 17 wherein the selected constant value is zero.
  - 19. The method of claim 17 wherein the values of the one or more of the weights are non-zero, the selected constant value being an average of the values of the one or more of the weights.
  - 20. The method of claim 17 wherein selected ones of the plurality of weights, other than the one or more of the weights, are set to at least one other selected constant value in deriving the state observation likelihood measure.
  - 21. The method of claim 20 wherein the selected constant value and the at least one other selected constant value are selected in accordance with an LBG algorithm.
  - 22. The method of claim 20 wherein the one or more of the weights are closer, in terms of a distance measure, to the selected constant value than to each of the at least one other selected constant value.
  - 23. The method of claim 22 wherein said distance measure is an absolute value distance measure.
  - 24. The method of claim 22 wherein said distance measure is asymmetric.
  - 25. The method of claim 17 wherein said probability kernels are Gaussian kernels in accordance with tied-mixture hidden Markov models (HMMs).
  - 26. The method of claim 17 wherein said weights are mixture component weights in accordance with tied-mixture HMMs.

27. A method for recognizing speech comprising:
- deriving at least one state observation likelihood measure in response to a representation of said speech, each state observation likelihood measure being a function of at least a plurality of probability kernels and a plurality of weights, each probability kernel being associated with a respective one of the weights, only the probability kernels associated with those weights whose values are non-zero being identified by respective indexes;
  
  providing each non-zero weight and an index identifying the probability kernel associated with the non-zero weight for deriving the at least one state observation likelihood measure; and
  
  generating signals representative of recognized speech based on the at least one state observation likelihood measure.
- View Dependent Claims (28, 29)
- - 28. The method of claim 27 wherein said probability kernels are Gaussian kernels in accordance with tied-mixture HMMs.
  - 29. The method of claim 27 wherein said weights are mixture component weights in accordance with tied-mixture HMMs.

30. A method for recognizing speech based on at least one state observation likelihood measure, each state observation likelihood measure being a function of at least a plurality of probability kernels and a plurality of weights associated therewith, comprising:
- deriving the at least one state observation likelihood measure based on a subset of the plurality of probability kernels, each probability kernel in the subset being a function of a representation of the speech to be recognized, the number of probability kernels in the subset being predetermined, the predetermined number being smaller than the number of the plurality of probability kernels, the probability kernels in the subset each having a larger value than any of the probability kernels outside the subset; and
  
  an output for generating signals representative of recognized speech based on the at least one state observation likelihood measure.
- View Dependent Claims (31, 32)
- - 31. The method of claim 30 wherein said probability kernels are Gaussian kernels in accordance with tied-mixture HMMs.
  - 32. The method of claim 30 wherein said weights are mixture component weights in accordance with tied-mixture HMMs.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Lucent Technologies, Inc. (Nokia Corporation)
Inventors
Haimi-Cohen, Raziel, Gupta, Sunil K., Soong, Frank K.
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
Storm, Donald L.

Application Number

US08/927,883
Time in Patent Office

838 Days
Field of Search

704/256, 704/255, 704/240, 704/243, 704/245
US Class Current

704/240
CPC Class Codes

G10L 15/144 Training of HMMs

Technique for selective use of Gaussian kernels and mixture component weights of tied-mixture hidden Markov models for speech recognition

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

Citations

32 Claims

Specification

Solutions

Use Cases

Quick Links

Technique for selective use of Gaussian kernels and mixture component weights of tied-mixture hidden Markov models for speech recognition

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

32 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links