Feneme-based Markov models for words
First Claim
1. An apparatus for modeling words, said apparatus comprising:
- means for measuring the value of at least one feature of an utterance of a first word, said utterance occurring over a series of successive time intervals of equal duration Δ
t, said means measuring the feature value of the utterance during each time interval to produce a series of feature vector signals representing the feature values;
means for storing a finite set of probabilistic model signals, each probabilistic model signal representing a probabilistic model of a component sound, each probabilistic model comprising a Markov model having (a) only first and second states, (b) a first transition extending from the first state to the second state, (c) a second transition extending from the first state back to itself, (d) a null transition extending from the first state to the second state, (e) a transition probability for each transition, (f) an output probability for each output signal belonging to a finite set of output signals that the output signal will be produced at the first transition, (g) an output probability for each output signal that the output signal will be produced at the second transition, and (h) an output probability of zero for each output signal that the output signal will be produced at the null transition, each output signal of each Markov model representing the value of at least one feature of an utterance measured over a time interval having a duration substantially equal to Δ
t;
means for storing a finite set of training label vector signals, each training label vector signal having an associated probabilistic model signal, each training label vector signal having at least one parameter value;
means for comparing the feature value, of each feature vector signal in the series of feature vector signals produced by the measuring means as a result of the utterance of the first word, to the parameter values of the training label vector signals to determine, for each feature vector signal, the closest associated training label vector signal;
means for forming a baseform of the first word from the series of feature vector signals by substituting, for each feature vector signal, the closest associated training label vector signal to produce a baseform series of training label vector signals; and
means for forming a probabilistic model of the first word from the baseform series of training label vector signals by substituting, for each training label vector signal, the associated probabilistic model signal to produce a series of probabilistic model signals.
0 Assignments
0 Petitions
Accused Products
Abstract
In a speech recognition system, apparatus and method for modelling words with label-based Markov models is disclosed. The modelling includes: entering a first speech input, corresponding to words in a vocabulary, into an acoustic processor which converts each spoken word into a sequence of standard labels, where each standard label corresponds to a sound type assignable to an interval of time; representing each standard label as a probabilistic model which has a plurality of states, at least one transition from a state to a state, and at least one settable output probability at some transitions; entering selected acoustic inputs into an acoustic processor which converts the selected acoustic inputs into personalized labels, each personalized label corresponding to a sound type assigned to an interval of time; and setting each output probability as the probability of the standard label represented by a given model producing a particular personalized label at a given transition in the given model. The present invention addresses the problem of generating models of words simply and automatically in a speech recognition system.
241 Citations
2 Claims
-
1. An apparatus for modeling words, said apparatus comprising:
-
means for measuring the value of at least one feature of an utterance of a first word, said utterance occurring over a series of successive time intervals of equal duration Δ
t, said means measuring the feature value of the utterance during each time interval to produce a series of feature vector signals representing the feature values;means for storing a finite set of probabilistic model signals, each probabilistic model signal representing a probabilistic model of a component sound, each probabilistic model comprising a Markov model having (a) only first and second states, (b) a first transition extending from the first state to the second state, (c) a second transition extending from the first state back to itself, (d) a null transition extending from the first state to the second state, (e) a transition probability for each transition, (f) an output probability for each output signal belonging to a finite set of output signals that the output signal will be produced at the first transition, (g) an output probability for each output signal that the output signal will be produced at the second transition, and (h) an output probability of zero for each output signal that the output signal will be produced at the null transition, each output signal of each Markov model representing the value of at least one feature of an utterance measured over a time interval having a duration substantially equal to Δ
t;means for storing a finite set of training label vector signals, each training label vector signal having an associated probabilistic model signal, each training label vector signal having at least one parameter value; means for comparing the feature value, of each feature vector signal in the series of feature vector signals produced by the measuring means as a result of the utterance of the first word, to the parameter values of the training label vector signals to determine, for each feature vector signal, the closest associated training label vector signal; means for forming a baseform of the first word from the series of feature vector signals by substituting, for each feature vector signal, the closest associated training label vector signal to produce a baseform series of training label vector signals; and means for forming a probabilistic model of the first word from the baseform series of training label vector signals by substituting, for each training label vector signal, the associated probabilistic model signal to produce a series of probabilistic model signals.
-
-
2. A method of modeling words, said method comprising the steps of:
-
measuring the value of at least one feature of an utterance of a first word, said utterance occurring over a series of successive time intervals of equal duration Δ
t, said measuring step comprising measuring the feature value of the utterance during each time interval to produce a series of feature vector signals representing the feature values;storing a finite set of probabilistic model signals, each probabilistic model signal representing a probabilistic model of a component sound, each probabilistic model comprising a Markov model having (a) only first and second states, (b) a first transition extending from the first state to the second state, (c) a second transition extending from the first state back to itself, (d) a null transition extending from the first state to the second state, (e) a transition probability for each transition, (f) an output probability for each output signal belonging to a finite set of output signals that the output signal will be produced at the first transition, (g) an output probability for each output signal that the output signal will be produced at the second transition, and (h) an output probability of zero for each output signal that the output signal will be produced at the null transition, each output signal of each Markov model representing the value of at least one feature of an utterance measured over a time interval having a duration substantially equal to Δ
t;storing a finite set of training label vector signals, each training label vector signal having an associated probabilistic model signal, each training label vector signal having at least one parameter value; comparing the feature value, of each feature vector signal in the series of feature vector signals produced by the measurement of the utterance of the first word, to the parameter values of the training label vector signals to determine, for each feature vector signal, the closest associated training label vector signal; forming a baseform of the first word from the series of feature vector signals by substituting, for each feature vector signal, the closest associated training label vector signal to produce a baseform series of training label vector signals; and forming a probabilistic model of the first word from the baseform series of training label vector signals by substituting, for each training label vector signal, the associated probabilistic model signal to produce a series of probabilistic model signals.
-
Specification