Technique for selective use of Gaussian kernels and mixture component weights of tied-mixture hidden Markov models for speech recognition
First Claim
1. A speech recognizer comprising:
- a processor responsive to a representation of speech for deriving at least one state observation likelihood measure, each state observation likelihood measure being a function of at least a plurality of probability kernels and a plurality of weights associated therewith, one or more of the weights whose values are different from a selected constant value being set to the selected constant value in deriving the state observation likelihood measure; and
an output for generating signals representative of recognized speech based on the at least one state observation likelihood measure.
7 Assignments
0 Petitions
Accused Products
Abstract
In a speech recognition system, tied-mixture hidden Markov models (HMMs) are used to match, in the maximum likelihood sense, the phonemes of spoken words given the acoustic input thereof. In a well known manner, such speech recognition requires computation of state observation likelihoods (SOLs). Because of the use of HMMs, each SOL computation involves a substantial number of Gaussian kernels and mixture component weights. In accordance with the invention, the number of Gaussian kernels is cut down to reduce the computational complexity and increase the efficiency of memory access to the kernels. For example, only the non-zero mixture component weights and the Gaussian kernels associated therewith are considered in the SOL computation. In accordance with an aspect of the invention, only a subset of the Gaussian kernels of significant values, regardless of the values of the associated mixture component weights, are considered in the SOL computation. In accordance with another aspect of the invention, at least some of the mixture component weights are quantized to reduce memory space needed to store them. As such, the computational complexity and memory access efficiency are further improved.
-
Citations
32 Claims
-
1. A speech recognizer comprising:
-
a processor responsive to a representation of speech for deriving at least one state observation likelihood measure, each state observation likelihood measure being a function of at least a plurality of probability kernels and a plurality of weights associated therewith, one or more of the weights whose values are different from a selected constant value being set to the selected constant value in deriving the state observation likelihood measure; and an output for generating signals representative of recognized speech based on the at least one state observation likelihood measure. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A speech recognizer comprising:
-
a processor responsive to a representation of speech for deriving at least one state observation likelihood measure, each state observation likelihood measure being a function of at least a plurality of probability kernels and a plurality of weights, each probability kernel being associated with a respective one of the weights, only the probability kernels associated with those weights whose values are non-zero being identified by respective indexes; a repository for providing each non-zero weight and an index identifying the probability kernel associated with the non-zero weight for deriving the at least one state observation likelihood measure; and an output for generating signals representative of recognized speech based on the at least one state observation likelihood measure. - View Dependent Claims (12, 13)
-
-
14. Apparatus for recognizing speech based on at least one state observation likelihood measure, each state observation likelihood measure being a function of at least a plurality of probability kernels and a plurality of weights associated therewith, comprising:
-
a processor for deriving the at least one state observation likelihood measure based on a subset of the plurality of probability kernels, each probability kernel in the subset being a function of a representation of the speech to be recognized, the number of probability kernels in the subset being predetermined, the predetermined number being smaller than the number of the plurality of probability kernels, the probability kernels in the subset each having a larger value than any of the probability kernels outside the subset; and an output for generating signals representative of recognized speech based on the at least one state observation likelihood measure. - View Dependent Claims (15, 16)
-
-
17. A method for recognizing speech comprising:
-
deriving at least one state observation likelihood measure in response to a representation of said speech, each state observation likelihood measure being a function of at least a plurality of probability kernels and a plurality of weights associated therewith, one or more of the weights whose values are different from a selected constant value being set to the selected constant value in deriving the state observation likelihood measure; and generating signals representative of recognized speech based on the at least one state observation likelihood measure. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. A method for recognizing speech comprising:
-
deriving at least one state observation likelihood measure in response to a representation of said speech, each state observation likelihood measure being a function of at least a plurality of probability kernels and a plurality of weights, each probability kernel being associated with a respective one of the weights, only the probability kernels associated with those weights whose values are non-zero being identified by respective indexes; providing each non-zero weight and an index identifying the probability kernel associated with the non-zero weight for deriving the at least one state observation likelihood measure; and generating signals representative of recognized speech based on the at least one state observation likelihood measure. - View Dependent Claims (28, 29)
-
-
30. A method for recognizing speech based on at least one state observation likelihood measure, each state observation likelihood measure being a function of at least a plurality of probability kernels and a plurality of weights associated therewith, comprising:
-
deriving the at least one state observation likelihood measure based on a subset of the plurality of probability kernels, each probability kernel in the subset being a function of a representation of the speech to be recognized, the number of probability kernels in the subset being predetermined, the predetermined number being smaller than the number of the plurality of probability kernels, the probability kernels in the subset each having a larger value than any of the probability kernels outside the subset; and an output for generating signals representative of recognized speech based on the at least one state observation likelihood measure. - View Dependent Claims (31, 32)
-
Specification