Automatic determination of labels and Markov word models in a speech recognition system
First Claim
1. Speech processing apparatus comprising:
- an acoustic processor for producing as a first output, in response to speech input, one label after another at successive time intervals, each label being selected from an alphabet of labels, each label having parameter values;
dictionary means for storing statistical data for each of a plurality of vocabulary words as Markov model word baseforms, wherein each baseform is characterized by a sequence of Markov models, at least one word baseform containing at least one Markov model at different locations in the sequence, each Markov model having a plurality of arcs, wherein the dictionary means includes storage for (i) the respective probability of each arc in each Markov model, and (ii) a respective probability of producing each label in the alphabet at each of some arcs in each Markov modelmeans, coupled to said acoustic processor, for re-specifying the parameter values of the labels in the alphabet which can be produced as outputs of the acoustic processor; and
baseform constructor means, coupled to said dictionary means, for up-dating the stored data for the Markov model word baseforms from labels generated by the acoustic processor based on the re-specified parameter values;
wherein said label re-specifying means re-specifies the parameter values of labels based on the up-dated stored data for the Markov model word baseforms;
wherein said acoustic processor produces as a second output one feature vector after another at the successive time intervals;
wherein each different Markov model corresponds to one respective label; and
wherein said label re-specifying means includes;
alignment processor means for aligning a string of labels generated by the acoustic processor against a word baseform stored in the dictionary means, said alignment processor means aligning successive substrings in the string with successive Markov models in the word baseform; and
estimator means for receiving as input from the acoustic processor the feature vectors corresponding to the labels aligned with a given Markov model and computing means and covariance values of the feature vectors received for the given Markov model; and
label specifier means, coupled to the estimator means, for storing (i) the mean and covariance values of the feature vectors corresponding to the labels aligned with each Markov model, as (ii) the parameter values of the label corresponding to the Markov model.
0 Assignments
0 Petitions
Accused Products
Abstract
In a Markov model speech recognition system, an acoustic processor generates one label after another selected from an alphabet of labels. Each vocabulary word is represented as a baseform constructed of a sequence of Markov models. Each Markov model is stored in a computer memory as (a) a plurality of states; (b) a plurality of arcs, each extending from a state to a state with a respective stored probability; and (c) stored label output probabilities, each indicating the likelihood of a given label being produced at a certain arc. Word likelihood based on acoustic characteristics is determined by matching a string of labels generated by the acoustic processor against the probabilities stored for each word baseform. Improved models of words are obtained by specifying label parameters and constructing word baseforms interdependently and iteratively.
-
Citations
21 Claims
-
1. Speech processing apparatus comprising:
-
an acoustic processor for producing as a first output, in response to speech input, one label after another at successive time intervals, each label being selected from an alphabet of labels, each label having parameter values; dictionary means for storing statistical data for each of a plurality of vocabulary words as Markov model word baseforms, wherein each baseform is characterized by a sequence of Markov models, at least one word baseform containing at least one Markov model at different locations in the sequence, each Markov model having a plurality of arcs, wherein the dictionary means includes storage for (i) the respective probability of each arc in each Markov model, and (ii) a respective probability of producing each label in the alphabet at each of some arcs in each Markov model means, coupled to said acoustic processor, for re-specifying the parameter values of the labels in the alphabet which can be produced as outputs of the acoustic processor; and baseform constructor means, coupled to said dictionary means, for up-dating the stored data for the Markov model word baseforms from labels generated by the acoustic processor based on the re-specified parameter values; wherein said label re-specifying means re-specifies the parameter values of labels based on the up-dated stored data for the Markov model word baseforms; wherein said acoustic processor produces as a second output one feature vector after another at the successive time intervals; wherein each different Markov model corresponds to one respective label; and
wherein said label re-specifying means includes;alignment processor means for aligning a string of labels generated by the acoustic processor against a word baseform stored in the dictionary means, said alignment processor means aligning successive substrings in the string with successive Markov models in the word baseform; and estimator means for receiving as input from the acoustic processor the feature vectors corresponding to the labels aligned with a given Markov model and computing means and covariance values of the feature vectors received for the given Markov model; and label specifier means, coupled to the estimator means, for storing (i) the mean and covariance values of the feature vectors corresponding to the labels aligned with each Markov model, as (ii) the parameter values of the label corresponding to the Markov model. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computerized method of processing speech for speech recognition comprising the steps of:
-
(a) generating, in an acoustic processor, one feature vector after another for one time interval after another in response to uttered speech, each feature vector having a feature value; (b) for each time interval, assigning one of an alphabet of stored labels thereto which corresponds to one prototype vector of an alphabet of prototype vectors, each prototype vector having parameter values, the parameter values of said one assigned prototype vector being the closest to the feature value of the feature vector generated for a given time interval; (c) storing each word of a vocabulary in a computer memory as a sequence of Markov models, at least one word containing at least one Markov model at different locations in the sequence, which includes the steps of; selecting a set of Markov models wherein each Markov model corresponds to a label; and storing, for each Markov model, a plurality of arc probabilities and label probabilities, wherein each label probability corresponds to the likelihood of a respective label being produced at a given Markov model arc; (d) for an uttered known word sequence, aligning labels which are generated according to step (a) with each successive Markov model included in the known word sequence; and (e) for a subject Markov model, re-specifying the prototype vector based solely on the feature vectors corresponding to each label aligned with the subject Markov model and associating the re-specified prototype vector with the label corresponding to the subject Markov model. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13)
-
-
14. A speech processing apparatus comprising:
-
means for measuring the value of at least one feature of a speech input, said speech input occurring over a series of successive time intervals, said means measuring the feature value of the speech input during each time interval to produce a series of feature vector signals representing the feature values; means for storing a plurality of label vector signals, each label vector signal having at least one parameter value and having a unique identification value; means for storing a baseform of a first word, said baseform comprising a sequence of baseform segments, each baseform segment being assigned a label vector signal identification value, at least two separate baseform segments being assigned a first label vector signal identification value of a first label vector signal; means for sorting a series of feature vector signals produced by the measuring means as a result of one or more utterances of the first word into groups of one or more feature vector signals, one group of feature vector signals corresponding to the two or more segments of the baseform of the first word which are assigned the first label vector signal identification value; and means for modifying the stored parameter value of the first label vector signal as a function solely of the feature values of the feature vectors which correspond to the baseform segments which are assigned the identification value of the first label vector signal. - View Dependent Claims (15, 16, 17)
-
-
18. A speech processing apparatus comprising:
-
means for measuring the value of at least one feature of a speech input, said speech input occurring over a series of successive time intervals, said means measuring the feature value of the speech input during each time interval to produce a series of feature vector signals representing the feature values; means for storing a plurality of label vector signals, each label vector signal having at least one parameter value and having a unique identification value; means for storing baseforms of first and second words, each baseform comprising a sequence of baseform segments, each baseform segment being assigned a label vector signal identification value, at least one baseform segment from the baseform of the first word being assigned a first label vector signal identification value of a first label vector signal, at least one baseform segment from the baseform of the second word being assigned the first label vector signal identification value; means for sorting a series of feature vector signals produced by the measuring means as a result of one or more utterances of the first and second words into groups of one or more feature vector signals, one group of feature vector signals corresponding to the two or more segments of the baseforms of the first and second words which are assigned the first label vector signal identification value; and means for modifying the stored parameter value of the first label vector signal as a function solely of the feature values of the feature vectors which correspond to the baseform segments which are assigned the identification value of the first label vector signal. - View Dependent Claims (19, 20, 21)
-
Specification