Using a discretized, higher order representation of hidden dynamic variables for speech recognition
First Claim
Patent Images
1. A method of recognizing speech, comprising:
- training parameters of a generative model based on speech training data indicative of indexed articulatory dynamic values calculated from the speech in the training data having different types of articulatory dynamics, the articulatory dynamic values being of at least second order and being represented by a distribution and the parameters of the generative model including a precision parameter trained based on a precision of the distribution of the articulatory dynamic;
receiving an observable acoustic value that describes a portion of a speech signal for a current time period under consideration;
identifying a predicted acoustic value for a hypothesized phonological unit, using the generative model, based on the indexed articulatory dynamics values and depending on indexed articulatory dynamics values calculated for at least two previous time periods; and
comparing the observed value to the predicted value to determine a likelihood of the hypothesized phonological unit.
2 Assignments
0 Petitions
Accused Products
Abstract
A hidden dynamics value in speech is represented by a higher order, discretized dynamic model, which predicts the discretized dynamic variable that changes over time. Parameters are trained for the model. A decoder algorithm is developed for estimating the underlying phonological speech units in sequence that correspond to the observed speech signal using the higher order, discretized dynamic model.
-
Citations
12 Claims
-
1. A method of recognizing speech, comprising:
-
training parameters of a generative model based on speech training data indicative of indexed articulatory dynamic values calculated from the speech in the training data having different types of articulatory dynamics, the articulatory dynamic values being of at least second order and being represented by a distribution and the parameters of the generative model including a precision parameter trained based on a precision of the distribution of the articulatory dynamic; receiving an observable acoustic value that describes a portion of a speech signal for a current time period under consideration; identifying a predicted acoustic value for a hypothesized phonological unit, using the generative model, based on the indexed articulatory dynamics values and depending on indexed articulatory dynamics values calculated for at least two previous time periods; and comparing the observed value to the predicted value to determine a likelihood of the hypothesized phonological unit. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method of training a model for use in recognizing speech described by an observable input value, comprising:
-
receiving observable training data indicative of a plurality of different types of speech; and training model parameters for an articulatory dynamics model that represents articulatory dynamics of speech that vary continuously over time and are represented by discrete values calculated from the observable training data for time periods, the model parameters being trained based on the discrete values of the articulatory dynamics calculated for at least two previous time periods; wherein training model parameters comprises; training the model parameters using expectation-maximization in which values of each parameter are first estimated using forward-backward recursion based on estimations of the articulatory dynamics from at least two previous time periods by re-estimating the model parameters based on a current estimation of the model parameters and estimates of the model parameters from at least two previous time periods; and training a precision parameter indicative of a precision of the value of the articulatory dynamics calculated. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A speech recognition system comprising:
-
a generative model modeling articulatory dynamics hidden in an observed speech signal that extends over multiple time periods and mapping the articulatory dynamics to a measurable characteristic of the observed speech signal, the generative model modeling the articulatory dynamics based on discrete values of the articulatory dynamics estimated for at least two previous time periods; a decoder, coupled to the generative model, receiving an observed value describing at least a portion of the observed speech signal and selecting one or more hypothesized phonological units based on the measurable characteristic output by the generative model, corresponding to the observed value, and based on the observed value; and a training component training parameters of the generative model based on training data indicative of speech having different types of articulatory dynamics, wherein the training component trains the parameters of the generative model based on indexed articulatory dynamic values calculated from the training data and being of at least a second order, and the training component training one of the parameters of the generative model as a precision parameter indicative of a precision of the value of the articulatory dynamics calculated. - View Dependent Claims (12)
-
Specification