Using a discretized, higher order representation of hidden dynamic variables for speech recognition
First Claim
Patent Images
1. A method of recognizing speech, comprising:
- receiving an observable acoustic value that describes a portion of a speech signal for a current time period under consideration;
identifying a predicted acoustic value for a hypothesized phonological unit based on an indexed articulatory dynamics value depending on indexed articulatory dynamics values calculated for at least two previous time periods; and
comparing the observed value to the predicted value to determine a likelihood of the hypothesized phonological unit.
2 Assignments
0 Petitions
Accused Products
Abstract
A hidden dynamics value in speech is represented by a higher order, discretized dynamic model, which predicts the discretized dynamic variable that changes over time. Parameters are trained for the model. A decoder algorithm is developed for estimating the underlying phonological speech units in sequence that correspond to the observed speech signal using the higher order, discretized dynamic model.
-
Citations
17 Claims
-
1. A method of recognizing speech, comprising:
-
receiving an observable acoustic value that describes a portion of a speech signal for a current time period under consideration; identifying a predicted acoustic value for a hypothesized phonological unit based on an indexed articulatory dynamics value depending on indexed articulatory dynamics values calculated for at least two previous time periods; and comparing the observed value to the predicted value to determine a likelihood of the hypothesized phonological unit. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method of training a model for use in recognizing speech described by an observable input value, comprising:
-
receiving observable training data indicative of a plurality of different types of speech; and training model parameters for an articulatory dynamics model that represents articulatory dynamics of speech that vary continuously over time and are represented by discrete values calculated from the observable training data for time periods, the model parameters being trained based on the discrete values of the articulatory dynamics calculated for at least two previous time periods. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13)
-
-
14. A speech recognition system comprising:
-
a generative model modeling articulatory dynamics hidden in an observed speech signal that extends over multiple time periods and mapping the articulatory dynamics to a measurable characteristic of the observed speech signal, the generative model modeling the articulatory dynamics based on discrete values of the articulatory dynamics estimated for at least two previous time periods; and a decoder, coupled to the generative model, configured to receive an observed value describing at least a portion of the observed speech signal and to select one or more hypothesized phonological units based on the measurable characteristic output by the generative model, corresponding to the observed value, and based on the observed value. - View Dependent Claims (15, 16, 17)
-
Specification